miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
818 stars 118 forks source link

IndexError on malformed markdown tables #173

Closed koreyou closed 1 year ago

koreyou commented 1 year ago

Mistletoe raises an IndexError when malformed markdown is give. A minimum reproducible sample is:

import mistletoe

mistletoe.Document('|___|______|_')

(I identified the substring that is causing the error. It raises an error for (some) strings that include the above substring.)

Even though the input is malformed as markdown, it should probably raise mistletoe-specific error so that users can catch it.

I used mistletoe==0.9.0 on Python 3.9.13.

pbodnar commented 1 year ago

Hi @koreyou, thanks for the report. Luckily, this seems to be already fixed in the master (and a new mistletoe version will be hopefully released soon).

Details: The full error stack trace:

Traceback (most recent call last):
  ...
  File "d:\projects\my-forks\mistletoe\mistletoe\span_token.py", line 97, in find
    return core_tokens.find_core_tokens(string, _root_node)
  File "d:\projects\my-forks\mistletoe\mistletoe\core_tokens.py", line 67, in find_core_tokens
    process_emphasis(string, None, delimiters, matches)
  File "d:\projects\my-forks\mistletoe\mistletoe\core_tokens.py", line 106, in process_emphasis
    bottom = star_bottom if closer.type[0] == '*' else underscore_bottom
IndexError: string index out of range

So it was rather a problem in parsing emphasis - when closer.type (property of Delimiter class instance) was an empty string for some strange reason. This sample also used to fail: ___|______|_, but e.g. ___b______c_ passes. Anyway, this was hopefully fixed, even if seemingly "by accident", by 8c90f9de081ea14d6848b8adac4b8ad8ba6b435e, within #108.

So I'm closing this as "resolved", as I can't see under which circumstances closer.type could (have) become empty. Yes, we could add a test on emptiness, yet this would be just a workaround of the problem, as avoiding the empty string creation would be the real fix.