matthewwithanm / python-markdownify

Convert HTML to Markdown
MIT License
1.17k stars 140 forks source link

More selective escaping of `-#.)` (alternative approach) #149

Closed jsm28 closed 1 week ago

jsm28 commented 2 months ago

This is a partial alternative to #122 (open since April) for more selective escaping of some special characters.

Here, we fix the test function naming (as noted in that PR) so the tests are actually run (and fix some incorrect test assertions so they pass). We also make escaping of -#.) (the most common cases of unnecessary escaping in my use case) more selective, while still being conservatively safe in escaping all cases of those characters that might have Markdown significance (including in the presence of wrapping, unlike in #122). (Being conservatively safe doesn't include the cases where . or ) start a fragment, where the existing code already was not conservatively safe.)

There are certainly more cases where the code could also be made more selective while remaining conservatively safe (including in the presence of wrapping), so this is not a complete replacement for #122, but by fixing some of the most common cases in a safe way, and getting the tests actually running, I hope this allows progress to be made where the previous attempt appears to have stalled, while still allowing further incremental progress with appropriately safe logic for other characters where useful.

chrispy-snps commented 1 week ago

@jsm28, @AlexVonB - thanks for taking the time to pull some of #122 into the main branch!