miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
811 stars 113 forks source link

Fix for #89, List Items separated by tab character not parsed correctly. #164

Closed anderskaplan closed 1 year ago

anderskaplan commented 1 year ago

Also fixes failing examples 312 and 313 in the CommonMark 0.30 spec, due to the way leading space is now checked for list items.

The direct cause of the reported bug was that only spaces and not tabs were considered valid separators for list item markers. Another problem was that the implemented tab expansion, where tabs were always expanded to four spaces, did not work according to the spec, which states that tabs should be expanded to the nearest tab stop (of width 4).

This fix uses expandtabs() to implement the tab stops correctly and moves extraction of content into the parse_marker() and parse_continuation() methods. This lets us implement use cases like "list interrupts a paragraph" and "list item continuation" in a less error-prone way.

anderskaplan commented 1 year ago

hi @pbodnar, yes still here 😄. Just holding back with more stuff until at least some of the work in progress has been merged. So thanks for the review, I'll dig into it as soon as I can.

anderskaplan commented 1 year ago

There, I think all comments have been addressed now.

anderskaplan commented 1 year ago

ok!

pbodnar commented 1 year ago

Thank you, merged. :)

For the record, this fix seems to introduce a slight performance decrease in the benchmark test (py test/benchmark.py mistletoe), but only about 0,5%, so nothing to worry about, I think.