Closed pelme closed 3 weeks ago
I guess this is based on html.escape just being a bunch of .replace() calls: https://github.com/python/cpython/blob/77d79989fd670633dce001877451f0dc120fbaf8/Lib/html/__init__.py#L10C1-L26C1
... and in combination with #401 the behavior changed. Not sure what is expected really.
It's hard to say what should be done here. Markup.replace
escaping the new string makes sense to me, what if you for example did template.replace("USERNAME", username)
, you'd want the username
value to get escaped. On the other hand, html.escape
is specifically replacing with the escaped value, so MarkupSafe shouldn't double escape that value. But that's impossible to detect.
You can always convert a Markup
string to a plain string, if you move to a context that doesn't understand Markup
or the __html__
convention: html.escape(str(m))
will do the right thing.
You could also try to request that html.escape
check for an __html__
method and call that if available, which is supported by Django and MarkupSafe (and probably others).
Or you could request that html.escape
change its calls from s.replace("", "")
to str.replace(s, "", "")
. If it always calls the base string method rather than the object's type's method, then it will skip the escaping behavior of Markup.replace
.
Not sure if either of these feature requests would be accpeted in python though.
Just stepped through html.escape(Markup("<test>"))
with a debugger, turns out this wasn't doing the right thing in 2.1 either, for the exact reason in #401. Markup.replace
used to escape every string argument, so Markup("<test>").replace("<", "<")
was becoming "<test>".replace("<", "&lt;")
and not replacing anything. That made it look like it was working, but it was actually just skipping everything unintentionally.
I'm going to say the new behavior makes more sense, it makes replace
safe. There's no way to both make it work safely and not double escape when called by html.escape
.
Thanks for the detailed response and looking into this. I agree that the new better and makes more sense.
(I got here because I missed to implement __html__()
on a class that was passed to Django's conditional_escape (which respects html like markupsafe.escape as you pointed out above). Django then eventually ended up calling html.escape() on my object which worked with previous versions of markupsafe (by accident). Implementing __html__()
on my class solved the problem: https://github.com/pelme/htpy/pull/65 🙂 ).
You could try opening a Django issue for it to do html.escape(str(v))
rather than html.escape(v)
, but given that there's also a supported way to fix it I'm not sure it's worth it.
Django sometimes does str(x).__html__()
and sometimes just x.__html__()
directly. I have been using this class together with various parts of Django that deals with html and never ran into this before. I guess both are "valid" uses of the "html protocol"? :/
Consider this example:
Output with
MarkupSafe==2.1.5
:Output with
MarkupSafe==3.0.0
:I think this is a bug/regression and the 2.x behavior is what is expected. Also, the escaping does not make sense. In case it would actually be escaped, the output would should then be
<div>hi</div>
.Environment: