Closed nguiard closed 1 year ago
Did you install the css
extras?
https://bleach.readthedocs.io/en/latest/clean.html#sanitizing-css
Oh sorry I didn't. It's probably just that. I'll do that and reopen if needed. Thanks!
Can I get some help with this? The thing you're hitting is this:
Would it have helped if Bleach had emitted a Python warning because you've got "style" as an allowed attribute, but hadn't specified a css_sanitizer
? If not that, should it throw an exception? I'm pretty sure the situation is an indication of a mistake and a developer would want to know and not have the problem you just had. I can't think of a case where you'd want to be in that situation (specifying style
as allowed, but don't want to have the css sanitized), but I didn't know if I was lacking imagination or not. What do you think?
Sure! So, first of all, installing and using the css extras fixed my issue.
But as you suggested, effectively I think it would have been very nice to have a Python warning or error about that. Being a bit new to bleach and just wanting to adjust my previous basic bleaching to now allow for katex markup, I looked at the docs and the issues here, but did not get at first that the css extras would be relevant. I saw the css_sanitizer
option in Cleaner, but I thought that a value of None
would not parse/sanitize the css.
I think it's not crazy to think that at first (after all, it feels natural that "None" sanitizer would sanitize nothing), even though I understand that not sanitizing the css would rarely be the correct call.
I'm going to re-open this to cover two changes:
style
attribute, you should also set a css_sanitizer
otherwise the style value will be truncated.style
is allowed, but the css_sanitizer
is not set.Related to this is the question of what tags and styles we should allow for Katex, as it is not necessarily trivial to get the complete list.
And more generally, say in theory you trust a plugin's output (not saying I trust Katex output specifically), but if that plugin uses a lot of tags, then you end up allowing a lot of tags you wouldn't have allowed normally. The allowed tags approach seems kind of flawed in that case. I don't know if there is a better way in these kinds of cases, like maybe treating parts separately...
Having a context aware allow list could help here. Bleach definitely doesn't support that currently. It feels like it'd be hard to implement because the stripping/escaping for tags is spread across a few classes, but maybe that's not true. You could try looking into that.
Bleach truncates a lot of Katex style attributes
Basic example: a markdown_katex output may contain a span like so :
<span class="vlist" style="height:1.0697em;">
. It contains astyle
attribute, and when passed through bleach (allowing thestyle
attribute), I get this:While the desired output would be:
As a result, actual Katex math doesn't render properly.
python and bleach versions:
To Reproduce
Steps to reproduce the behavior:
Additional context
I am unsure if this is actually a bug or intended behavior in some way. The more general problem I face is: how to correctly use bleach after user input is transformed through markdown with the markdown_katex extension?