mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 253 forks source link

Style attributes are getting stripped off #724

Closed Buvi1234 closed 10 months ago

Buvi1234 commented 10 months ago

If i sanitise the style tag it is stripped for eg : "

<span style=\\\"color: inherit; background: inherit;\\\"><span dir=\\\"ltr\\\" style=\\\"color: inherit; background: inherit;\\\"> this the data I give and the output is like this "

<span dir=\'\"ltr\"\' style=""> i have allowed the style, p and span tag in css_sanitizser i have also given color and background also but it is getting stripped out any reason?

willkg commented 10 months ago

Can you put that into a test or a python script so it's easier to see what you're doing?

Buvi1234 commented 10 months ago

(Pdb) bleach.clean('

<span style=\"color: inherit; background: inherit;\

', tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES), strip=True) output is

LIke Why it is happening in such a way, style contents are not retained here why any reason. image

But for class attributes the output is coming properly image

https://github.com/mozilla/bleach/issues/91 --> same issue

willkg commented 10 months ago

Here's what I think you're trying to do based on the last couple of comments:

from bleach import (
    clean,
    ALLOWED_ATTRIBUTES,
    ALLOWED_TAGS,
)
from bleach.css_sanitizer import (
    ALLOWED_CSS_PROPERTIES,
    CSSSanitizer,
)

print(clean(
    '<p><span style=\\"color: inherit; background: inherit;\\</span></p>',
    tags=ALLOWED_TAGS,
    attributes=ALLOWED_ATTRIBUTES,
    css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES),
    strip=True,
))

The text that's being cleaned is missing a lot of HTML things. It's got a \" and it's missing an end " and the closing of the first <span> tag. When the Bleach parser goes through that, it's seeing HTML-like things and trying to complete them and then that probably results in it dropping stuff.

On top of that, the set of ALLOWED_TAGS doesn't include p or span. You would need to include those tags for them to not get stripped. You'd need to do something like this:

from bleach import clean, ALLOWED_TAGS

my_tags = set(["p", "span"] + list(ALLOWED_TAGS))
clean("<p><span>something</span></p>", tags=my_tags)

Set of ALLOWED_TAGS is documented here: https://bleach.readthedocs.io/en/latest/clean.html#allowed-tags-tags

For CSS properties, you're using the default set which does include color, but doesn't include background. You'd need to pass in a set of properties that includes background.

Set of ALLOWED_CSS_PROPERTIES isn't documented, but you can see it in the code here: https://github.com/mozilla/bleach/blob/c04958dcb931243b10e103a2e6ecfa700b190270/bleach/css_sanitizer.py#L4

Does that help?

Buvi1234 commented 10 months ago

Yeah up to some extend the allowed tag i have span and p tag and allowed_attributes i have mentioned *: {['style', "class"]} but the style tag value is been stripped any reason?

willkg commented 10 months ago

Please provide a complete Python script showing the problem you're having. Something I can put in a file and run python example_script.py to see the problem.

Buvi1234 commented 10 months ago

I have attached what i am trying to, please have a look `import bleach from bleach.css_sanitizer import CSSSanitizer

ALLOWED_TAGS = ['abbr', 'acronym', 'b', 'br', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'li', 'ol', 'p', 'q', 's', 'small', 'strike', 'strong', 'span', 'sub', 'sup', 'table', 'tbody', 'td', 'th', 'thead', 'tr', 'tt', 'u', 'ul', 'address', 'img', 'pre', 'tfoot', 'bdo', 'big', 'blockquote', 'center', 'cite', 'code', 'dd', 'del', 'dfn', 'embed', 'font', 'ins', 'kbd', 'param', 'samp', 'var', 'wbr']

ALLOWED_ATTRIBUTES = { '*': ['class', 'title', 'style', 'align', 'cite', 'size', 'type', 'dir'], 'img': ['src', 'alt'], 'table': ['border', 'width', 'height'], }

ALLOWED_CSS_PROPERTIES = ["color", "font-weight", "width", "border-bottom", "padding", "background-color", "font-size", "display", "box-sizing", "line-height", "white-space", "height", "line-height", "font-size", "font", "border-width", "text-align", "border-color", "font-family", "border-left-color", "border-right-color", "border-top-color", "font-color", 'text-decoration', "background"]

val = '

<span style=\\\"color: inherit; background: inherit;\\\"><span dir=\\\"ltr\\\" style=\\\"color: inherit; background: inherit;\\\">'

val = bleach.clean(val, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES), strip=True) print(val)`

image

image

willkg commented 10 months ago

You've got a lot of extra \ which is causing the attribute values to be invalid HTML and my guess is they're getting dropped. If you change val to be this:

<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;">

Then you get this as output:

<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;"></span></span>

Does that look like what you're looking for?

Buvi1234 commented 10 months ago

yes but without removing \ can we achieve it. but dir attribute it was working fine. can do it same for style tag also. or is there any way to achieve.

willkg commented 10 months ago

No, because all the \ make it nonsensical input data. Bleach clean will make sure the output is valid HTML, but it's not going to fix the input data in that way to make it less nonsensical. You'll need to write some code to fix the input text before running it through Bleach clean.

Buvi1234 commented 10 months ago

Thank you for info, just need a clarity like in the above comments i have attached a debugger screenshot where class we as able to retain some color value but style was not able to, any reason behind that?

willkg commented 10 months ago

I can't speculate on the pdb screenshot because it's missing a lot of context. Can you put it into a Python script so I can investigate?

Buvi1234 commented 10 months ago

https://github.com/mozilla/bleach/issues/529 `import bleach from bleach.css_sanitizer import CSSSanitizer

ALLOWED_TAGS = ['abbr', 'acronym', 'b', 'br', 'div', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'li', 'ol', 'p', 'q', 's', 'small', 'strike', 'strong', 'span', 'sub', 'sup', 'table', 'tbody', 'td', 'th', 'thead', 'tr', 'tt', 'u', 'ul', 'address', 'img', 'pre', 'tfoot', 'bdo', 'big', 'blockquote', 'center', 'cite', 'code', 'dd', 'del', 'dfn', 'embed', 'font', 'ins', 'kbd', 'param', 'samp', 'var', 'wbr']

ALLOWED_ATTRIBUTES = { '*': ['class', 'title', 'style', 'align', 'cite', 'size', 'type', 'dir'], 'img': ['src', 'alt'], 'table': ['border', 'width', 'height'], }

ALLOWED_CSS_PROPERTIES = ["color", "font-weight", "width", "border-bottom", "padding", "background-color", "font-size", "display", "box-sizing", "line-height", "white-space", "height", "line-height", "font-size", "font", "border-width", "text-align", "border-color", "font-family", "border-left-color", "border-right-color", "border-top-color", "font-color", 'text-decoration', "background"]

val = '

<span style=\"color: inherit; background: inherit;\"><span dir=\"ltr\" style=\"color: inherit; background: inherit;\">'

val = bleach.clean(val, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES), strip=True) print(val)

output: '

<span dir=\'\"ltr\"\' style="">

'

val = '

<span style='\"color:': inherit; background: inherit;\"><span dir=\"ltr\" style='\"color:': inherit; background: inherit;\">'

val = bleach.clean(val, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES), strip=True)

print(val)

output is for reference

style tag does not support dashes, but other attributes supports it.

willkg commented 10 months ago

I get the following output from the first blurb:

<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;"></span></span>

That output looks fine to me.

The second is malformed Python, so it doesn't execute.

At this point, I can't continue helping you with this. I don't see anything in here that suggests there's a bug in Bleach--it sure seems like it's doing what it should be doing. I think I've given you enough insight here to help you continue exploring why Bleach isn't doing what you want it to do. Hope that helps!