phfaist / pylatexenc

Simple LaTeX parser providing latex-to-unicode and unicode-to-latex conversion
https://pylatexenc.readthedocs.io
MIT License
301 stars 37 forks source link

Fix never ending loop #38

Closed Sinclert closed 4 years ago

Sinclert commented 4 years ago

This PR addresses issue https://github.com/phfaist/pylatexenc/issues/37, taking the proposed approach: "log and ignore".

In order to do so, I have defined:

phfaist commented 4 years ago

Thanks for your PR! As mentioned in #37 , I took the opportunity to carry out some deeper fixes a few things in how latexwalker recovers from errors in tolerant mode.

In tolerant mode, latexwalker should still try to give the most useful result possible and attempt to parse the remainder of the latex content. Note that in non-tolerant mode, the bug was not present because the error was simply directly reported instead of ignored.

The issue was that the error was raised by get_token(), and later when the error was ignored because of tolerant parsing mode, the parser re-attempted to parse the same token because it hadn't advanced the current position index, leading to an infinite loop. I think it makes more sense to have get_token() not raise parse errors in tolerant parsing mode, in the same way that get_latex_expression() and friends don't do so. As a side benefit, argument parsers that use get_token() to parse specific argument tokens don't have unhandled exceptions bubbling up; in those edge cases you could have some LaTeX commands effectively entirely disappearing if their arguments failed to parse correctly in tolerant parsing mode.

Thanks again, and let me know if you spot any further bugs! The fix will be available in the next release.