miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

`MarkdownRenderer` is not concurrent safe #210

Open Fallen-Breath opened 10 months ago

Fallen-Breath commented 10 months ago

What

Constructing a MarkdownRenderer modifies a global array (mistletoe.block_token.remove_token), which is not concurrent safe

https://github.com/miyuchina/mistletoe/blob/b911e5b64a98d537bb44a0212cbf8fa708d37d48/mistletoe/markdown_renderer.py#L112

As a result, using renderers in multiple threads at the same time results in exception being raised

To reproduce

Reproducable with mistletoe==1.2.1

import threading
import time
from mistletoe.markdown_renderer import MarkdownRenderer

def worker():
    with MarkdownRenderer() as render:
        time.sleep(1)

threading.Thread(target=worker).start()
threading.Thread(target=worker).start()

It will raise

Exception in thread Thread-2:
Traceback (most recent call last):
  File "**\Python39\lib\threading.py", line 973, in _bootstrap_inner
    self.run()
  File "**\Python39\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "**\example.py", line 8, in worker
    with MarkdownRenderer() as render:
  File "**\venv\lib\site-packages\mistletoe\markdown_renderer.py", line 101, in __init__
    block_token.remove_token(block_token.Footnote)
  File "**\venv\lib\site-packages\mistletoe\block_token.py", line 61, in remove_token
    _token_types.remove(token_cls)
ValueError: list.remove(x): x not in list

Other notes

Yeah I'm aware of the following notes in the readme:

Important: As can be seen from the example above, the parsing phase is currently tightly connected with initiation and closing of a renderer. Therefore, you should never call Document(...) outside of a with ... as renderer block, unless you know what you are doing.

If the described issue is an expected behavior, I'll suggest to leave a warning in the readme as well, so people know they need a threading.Lock for this

pbodnar commented 10 months ago

@Fallen-Breath, you are right. If nothing else (we could at least avoid that exception from calling remote_token()?), this should be documented. I think that the other renderers aren't by-design guaranteed to be 100% thread-safe either (mainly in the case we would use different renderers in parallel), because of modifying the global lists of token classes. But I haven't found the time to investigate that deeply yet...

pbodnar commented 9 months ago

Another thing, while having just looked at #212, mistletoe also partly uses class attributes to temporarily store parsed data in between method calls (e.g. here). AFAIK, this could cause random errors when running mistletoe in parallel (within a single Python process), right? So, I think mistletoe as a whole was not written with having thread-safety on mind. Wondering how much of an issue this could be among common mistletoe users...?

dreampuf commented 6 months ago

I have a similar problem with this implementation. It's not concurrency but gets into issues when it has two instances of MarkdownRenderer().

with MarkdownRenderer() as render:
  with MarkdownRenderer() as insider_render:
    pass

I wonder if we can have an internal state for each renderer instance.

dvdkon commented 2 months ago

Mistletoe's reliance on global state has been annoying me for some time, so I used my unexpected free time to refactor it to explicitly pass state around instead. I tried some tricks to keep compatibility with old token code, but sadly none of them worked well enough, so I had to change a lot of mistletoe's API.

A compatibility layer could be created so that existing users of mistletoe could keep their code unchanged, but it wouldn't make their use of mistletoe thread-safe.

@pbodnar as the most active maintainer, would you consider merging something like this? A large breaking change to mistletoe's structure to make it concurrent and reentrant, with or without a compatibility layer? Or is the current state good enough and this change too disruptive?

pbodnar commented 1 month ago

@dvdkon, thanks for pushing this further. As you yourself write, it is not a small change. So I'm not sure if I can give it the required time nowadays. So far, we've got just 3 upvotes for this issue, which is a relatively small number. But I agree it would be great to have mistletoe thread-safe. I guess I will look at this closer one day, but I cannot promise anything...