mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 251 forks source link

bug: possible regression in CSS parsing of advanced expressions #657

Closed eht16 closed 2 years ago

eht16 commented 2 years ago

I noticed a difference when parsing and removing CSS styles between Bleach 4.x and 5.x. This is probably related to #633 resp. #648.

Before, styles like list-style: url(https://www.example.com) seem to have been stripped even if list-style was in the styles list. Recent Bleach versions (5.x) do keep this list-style style and this seems correct to me.

python and bleach versions (please complete the following information):

To Reproduce

Steps to reproduce the behavior:

import bleach

allowed_css_styles = ['width', 'list-style']
allowed_attributes = {'*': ['class', 'style', 'id']}
allowed_tags = ['div']

test_html_list_style = '<div style="list-style: url(https://www.example.com); width: 100%;"></div>'
test_html_width = '<div style="width: expression(test);"></div>'

try:
    # Bleach 5.x
    from bleach.css_sanitizer import CSSSanitizer

    css_sanitizer = CSSSanitizer(allowed_css_properties=allowed_css_styles)

    bleached_text = bleach.clean(test_html_list_style, tags=allowed_tags, attributes=allowed_attributes, css_sanitizer=css_sanitizer)
    print(bleached_text)  # gives: <div style="list-style: url(https://www.example.com); width: 100%;"></div>

    bleached_text = bleach.clean(test_html_width, tags=allowed_tags, attributes=allowed_attributes, css_sanitizer=css_sanitizer)
    print(bleached_text)  # gives: <div style="width: expression(test);"></div>

except ImportError:
    # Bleach 4.x
    bleached_text = bleach.clean(test_html_list_style, tags=allowed_tags, attributes=allowed_attributes, styles=allowed_css_styles)
    print(bleached_text)  # gives: <div style="width: 100%;"></div>

    bleached_text = bleach.clean(test_html_width, tags=allowed_tags, attributes=allowed_attributes, styles=allowed_css_styles)
    print(bleached_text)  # gives: <div style=""></div>

Expected behavior

I think the current behavior is correct but I just want to get sure and that I'm not misunderstanding or misusing the new CSS sanitizer.

So, is the previous behavior more or less a bug in the old CSS sanitizer and has been fixed implicitly by the new one?

g-k commented 2 years ago

So, is the previous behavior more or less a bug in the old CSS sanitizer and has been fixed implicitly by the new one?

Yep! The bug was #529.