trentm / python-markdown2

markdown2: A fast and complete implementation of Markdown in Python
Other
2.64k stars 431 forks source link

Always restore hashed HTML blocks (issue #185) #521

Closed Crozzers closed 1 year ago

Crozzers commented 1 year ago

This PR fixes #185 by always restoring hashed html blocks at the end of conversion.

The original markdown snippet was as follows:

<div
>
<h3>Archons of the Colophon</h3>
<p>by Paco Xander Nathan
</p>
</div>

The <p> tag would be hashed by the strict block sub but the enclosing <div\n> would only be caught by the liberal block sub, leading to nested hashes.

Nested hashes don't get fully unravelled in _form_paragraphs, only 1 layer does, so hashes would remain in the text until the end of the conversion process. Since _unescape_special_chars doesn't process HTML blocks (only special chars and code blocks), the hash would remain in the final output.

What I've done is make sure that _unescape_special_chars also processes HTML blocks when performing its un-hashing.

nicholasserra commented 1 year ago

Thank you for your continued work on this issue