miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
791 stars 110 forks source link

`MarkdownRenderer` is not concurrent safe #210

Open Fallen-Breath opened 6 months ago

Fallen-Breath commented 6 months ago

What

Constructing a MarkdownRenderer modifies a global array (mistletoe.block_token.remove_token), which is not concurrent safe

https://github.com/miyuchina/mistletoe/blob/b911e5b64a98d537bb44a0212cbf8fa708d37d48/mistletoe/markdown_renderer.py#L112

As a result, using renderers in multiple threads at the same time results in exception being raised

To reproduce

Reproducable with mistletoe==1.2.1

import threading
import time
from mistletoe.markdown_renderer import MarkdownRenderer

def worker():
    with MarkdownRenderer() as render:
        time.sleep(1)

threading.Thread(target=worker).start()
threading.Thread(target=worker).start()

It will raise

Exception in thread Thread-2:
Traceback (most recent call last):
  File "**\Python39\lib\threading.py", line 973, in _bootstrap_inner
    self.run()
  File "**\Python39\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "**\example.py", line 8, in worker
    with MarkdownRenderer() as render:
  File "**\venv\lib\site-packages\mistletoe\markdown_renderer.py", line 101, in __init__
    block_token.remove_token(block_token.Footnote)
  File "**\venv\lib\site-packages\mistletoe\block_token.py", line 61, in remove_token
    _token_types.remove(token_cls)
ValueError: list.remove(x): x not in list

Other notes

Yeah I'm aware of the following notes in the readme:

Important: As can be seen from the example above, the parsing phase is currently tightly connected with initiation and closing of a renderer. Therefore, you should never call Document(...) outside of a with ... as renderer block, unless you know what you are doing.

If the described issue is an expected behavior, I'll suggest to leave a warning in the readme as well, so people know they need a threading.Lock for this

pbodnar commented 6 months ago

@Fallen-Breath, you are right. If nothing else (we could at least avoid that exception from calling remote_token()?), this should be documented. I think that the other renderers aren't by-design guaranteed to be 100% thread-safe either (mainly in the case we would use different renderers in parallel), because of modifying the global lists of token classes. But I haven't found the time to investigate that deeply yet...

pbodnar commented 4 months ago

Another thing, while having just looked at #212, mistletoe also partly uses class attributes to temporarily store parsed data in between method calls (e.g. here). AFAIK, this could cause random errors when running mistletoe in parallel (within a single Python process), right? So, I think mistletoe as a whole was not written with having thread-safety on mind. Wondering how much of an issue this could be among common mistletoe users...?

dreampuf commented 2 months ago

I have a similar problem with this implementation. It's not concurrency but gets into issues when it has two instances of MarkdownRenderer().

with MarkdownRenderer() as render:
  with MarkdownRenderer() as insider_render:
    pass

I wonder if we can have an internal state for each renderer instance.