miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
815 stars 115 forks source link

HTMLRenderer: Revise usage and implementation of `html_escape()` #115

Closed pbodnar closed 2 years ago

pbodnar commented 2 years ago

The implementation of html_escape() seems a bit inefficient and it also escapes " when it is not actually necessary.

Here is its source code:

@staticmethod
def escape_html(raw):
    return html.escape(html.unescape(raw)).replace(''', "'")

I think that html.escape()'s boolean parameter quote should be probably used instead of the call to replace(): set quote to False when escaping text outside of an attribute value, set it to True otherwise. The rendered result will change for the latter case, i. e. ' will be escaped, but it shouldn't matter, or should it?

pbodnar commented 2 years ago

This was de-facto resolved within #135 - this method was made deprecated and it isn't used by mistletoe itself anymore. Also, HTML entities are unescaped earlier in the process now and simple html.escape() is called whenever necessary.

This means that we keep escaping quotes (" becomes ") and we also newly escape ' as ', but nobody will probably complain about this. Also see this comment within #135 about compliance with CommonMark spec.