miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
811 stars 113 forks source link

Mistletoe hangs when parsing some specifically formatted Footnotes #124

Closed ddevault closed 2 years ago

ddevault commented 2 years ago
>>> import mistletoe
>>> input = "foo bar [1]:\r\nfoo bar\r\n\r\n[1]: https://example.org/\r\nhttps://example.org"
>>> mistletoe.markdown(input)

This never returns, or at least does not return within the limits of my patience.

pbodnar commented 2 years ago

Hi, it looks like this is caused by mistletoe not quite expecting CRLF line-endings in the input - see #64. From my quick testing, it freezes because of the last \r\n. The stacktrace is like this (after pressing ctrl+c):

$ python issue-124.py
Traceback (most recent call last):
  File "issue-124.py", line 3, in <module>
    print(mistletoe.markdown(input))
  File "d:\projects\my-forks\mistletoe\mistletoe\__init__.py", line 22, in markdown
    return renderer.render(Document(iterable))
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 150, in __init__
    self.children = tokenize(lines)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 49, in tokenize
    return tokenizer.tokenize(lines, _token_types)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 51, in tokenize
    return make_tokens(tokenize_block(iterable, token_types))
  File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 67, in tokenize_block
    result = token_type.read(lines)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 734, in read
    match_info = cls.match_reference(lines, string, offset)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 754, in match_reference
    match_info = cls.match_link_dest(string, label_end)
  File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 793, in match_link_dest
    offset = shift_whitespace(string, offset+1)
  File "d:\projects\my-forks\mistletoe\mistletoe\core_tokens.py", line 381, in shift_whitespace
    for i, c in enumerate(string[index:], start=index):
KeyboardInterrupt
pbodnar commented 2 years ago

So I would classify this as an enhancement with a workaround: use simple \n if you need to create an input string with line-endings programmatically (or possibly use a multi-line string).

OK for now?

ddevault commented 2 years ago

I would not classify a problem in which any input causes the library to hang forever as in need of an enhancement, but rather suffering from a bug. Consider that this is a DoS vector.

I will apply an appropriate workaround (converting CRLF to LF) in my software, but this is definitely a bug and probably an urgent one at that.

pbodnar commented 2 years ago

Good news, it looks like I found the culprit in the Footnote.backtrack() method / call. I guess I can come with a fix soon.

pbodnar commented 2 years ago

Fixed in the master branch. It has shown that any whitespace character before \n can break the parsing, not just \r.

ddevault commented 2 years ago

Thanks!