python-poetry / cleo

Cleo allows you to create beautiful and testable command-line interfaces.
https://cleo.readthedocs.io
MIT License
1.29k stars 92 forks source link

Formatter's `is_decorated` property is temporarily disabled during `remove_format` breaking concurrent threads #423

Open vfazio opened 7 months ago

vfazio commented 7 months ago

See https://github.com/python-poetry/poetry/issues/9334

Poetry relies on the formatter's is_decorated property to be consistent when multiple threads query the property.

If one thread is in the middle of updating a section and has changed the formatter's is_decorated property via remove_format, a second thread may query it and get back an incorrect result.

If cleo is not thread-safe, the developers of poetry should be made aware, otherwise there may need to be changes to how remove_format and format work in relation to whether decorated output is supported in general vs decorated output is being emitted for the current format operation.

vfazio commented 7 months ago

Note that this seems to be an issue because the Formatter object is a shared instance across the IO objects, in Poetry's case StreamOutput and child SectionOutput objects (see StreamOutput.section)

One "workaround" would be to duplicate the Formatter object instead of sharing the object. That way the StreamOutput object could query its Formatter object independent of what the SectionOutput objects are using. This would allow the sections to manipulate their formatters independently and existing code may "just work" without having to make:

Formatter.remove_format -> Formatter.format -> Formatter.format_and_wrap -> Formatter._apply_current_style

aware that while the formatter supports decorating, this operation should not be decorated.

If that's not ideal because formatter styles need to be kept in sync between the parent output object and child sections, maybe leverage a different private instance flag:

    def remove_format(self, text: str) -> str:
        self._suppress_decoration = True
        text = re.sub(r"\033\[[^m]*m", "", self.format(text))
        self._suppress_decoration = False

        return text

    def _apply_current_style(
        self, text: str, current: str, width: int, current_line_length: int
    ) -> tuple[str, int]:
        if not text:
            return "", current_line_length

        if not width:
            if self.is_decorated() and not self._suppress_decoration:
                return self._style_stack.current.apply(text), current_line_length

            return text, current_line_length

</snip>
        if self.is_decorated() and not self._suppress_decoration:
            apply = self._style_stack.current.apply
            text = "\n".join(map(apply, lines))

        return text, current_line_length
devkral commented 1 month ago

Use contextvars and your problems are gone. They are per Thread. You may want to use the scope feature of context vars:

contextvar = ContextVar(..)

...

token = contextvar.set(1)
try:
   ...
finally:
   contextvar.reset(token)
devkral commented 1 month ago

the property can be something like:

@property
def is_decorated(self):
   return contextvar.get()
Secrus commented 1 month ago

Thank you @devkral. Would you have a minute to make a PR?

vfazio commented 1 month ago

Not saying this isn't the right fix, but one of the questions I posited was whether this library was intended to be thread-safe. If it is, this fixes a single instance where there was a thread conflict, but there may be others. If not, then its on the caller to use the library as intended.

Secrus commented 1 month ago

@vfazio I'd want it to be thread-safe (at the very least, UI/terminal printing should be), since Poetry uses Cleo in threaded jobs. I don't know of an easy way to check if the code is thread-safe, but if you know, I would be happy to hear about it.

vfazio commented 1 month ago

Im on holiday atm but saw these emails come through.

I don't think theres an "easy" way, but the reality is that if any public attribute/property can be changed on an object, its open to a race condition without some critical section. If these classes were frozen, once created the attribute values didn't change, itd be less of an issue.

Thats the crux of this problem if i recall correctly, an internal function was flipping the value of a publicly queryable property which caused a race.

vfazio commented 6 days ago

@vfazio I'd want it to be thread-safe (at the very least, UI/terminal printing should be), since Poetry uses Cleo in threaded jobs. I don't know of an easy way to check if the code is thread-safe, but if you know, I would be happy to hear about it.

I just realize i dropped the ball on actually replying to this until i just reviewed the poetry repo for updates.

Poetry already takes responsibility for locking when updating sections, I'm pretty sure that's what this lock is used for

https://github.com/python-poetry/poetry/blob/c70cbf4b493b21294ecc0f230f7b621ce3225b11/src/poetry/installation/executor.py#L93

So I don't think Poetry expects cleo to be thread safe in general.

The central issue is that Poetry expected an Output object's is_decorated property to be frozen after the class has been instantiated. That would seem mostly reasonable given how the class is declared. However, that's now how the property works since code internal to the formatter flips this value when updating sections.

If Poetry had been made aware that this was not a thread-safe method, they could have just acquired the lock before reading the property:

with self._lock:
    decorated = self.supports_fancy_output()

My PR avoided the lock and just cached the value, since we expect that the final output will be decorated based on how the Output object was constructed. Regardless of the current state of the formatter as sections get written out, we know that we can count on the output being formatted when finally displayed.

https://github.com/python-poetry/poetry/pull/9335

I hopefully didn't give the impression that I thought this library needs to be thread-safe, only that a decision and some communication should be made about the intentions of the library.

If it's not intended to be thread safe, that's 100% ok IMO and consumers need to realize that they're on their own for solving issues that come up.

However, if there is real desire to make this thread safe and to potentially avoid consumers from having to declare their own critical sections, that's a much larger task.

For the formatter, i threw out a couple of bad ideas. Another bad idea is to just use a lock in any method that references to self._decorated to synchronize access while it's potentially being changed.

However, that's just one property and thread safety can be a game of whack-a-mole. Public methods like format and format_and_wrap which utilize an internal stack are likely to also get corrupted if multiple threads have access to the formatter and aren't synchronized.

Secrus commented 4 days ago

Let's put it this way: if it can be made thread-safe easily, why not do it? In this case the fix seems quite easy to do, so it will be fixed in the next version