Closed araspik closed 3 years ago
Good catch. The point is that quote()
by default escapes also nearly all "special" characters. I would possibly fix this issue as you suggest. This should be also done for the other existing renderers, not just HTML.
OK, so it is fixed now in the master. All 3 renderers (HTML, Jira, XWiki) use the same set of safe characters now (each can theoretically use a different set in the future):
/#:()*?=%@+,&;
All 3 also unescape XML character references firstly now, which seems to be generally required by the CommonMark spec linked above:
Entity and numerical character references in the destination will be parsed into the corresponding Unicode code points, as usual. These may be optionally URL-escaped when written as HTML ...
So I hope the fix is correct.
The only renderer left to fix is for LaTeX, but I don't work with this one, so I will rather file a new issue for that.
To reproduce:
Expected output (prettified):
Actual output (also prettified):
Apparently, mistletoe is escaping the
;
(semicolon) into%3B
in URLs even when it's a valid URL character.The CommonMark spec says the following (found below example 498):
The issue seems to come from
mistletoe/html_renderer.py:HTMLRenderer.escape_url:L216
. The function is documented as helping to prevent code injection, but I don't know how;
could be used in such a manner.I'm not familiar withhtml.escape
and related functions anyways, so am unable to suggest a fix.A possible fix is to add
;
to the list of 'safe' characters, making the line:although I don't know of any possible security issues of this.