Closed lionsheep0724 closed 4 months ago
@piotrm-nvidia Thank you for reporting this issue. However, I don't understand why this is happening in this particular case. The other one with the huggingface whisper model works fine. The only difference is whether binding faster-whisper or not.
@piotrm-nvidia Updates here. Simply block out from faster_whisper import WhisperModel
works fine. I guess some dependencies or thread-related object in ctranslate2(in faster_whisper) cause the issue.
Thank you for providing further details on the issue you're experiencing.
The core of the problem lies in how the Triton client interacts with gevent, particularly in scenarios involving multi-threaded operations. The Triton client incorporates a __del__
method that is responsible for cleaning up and closing connections. When the ModelClient, which utilizes the Triton client, attempts to close the connection, it does so in the appropriate thread. However, the Python garbage collector may invoke the __del__
method in a different thread at a later time. Gevent does not support operations across multiple threads by default. This leads to the InvalidThreadUseError
you're encountering, as gevent detects and disallows the cross-thread operation.
Although the exception is being ignored and might seem benign, it understandably causes confusion and concern. It's important to note that this issue is specific to the interaction between gevent and the Triton client's cleanup process. The problem you've observed with the faster-whisper model, as opposed to the huggingface whisper model, suggests that certain dependencies or thread-related objects might exacerbate this issue by affecting the threading context in which the Triton client operates.
We are actively working to address this issue to prevent such confusing behavior in the future and to ensure a smoother operation with libraries that utilize gevent or similar concurrency mechanisms.
@piotrm-nvidia
Many thanks for your reply! AFAIU, the root cause is the calling of the __del__
method. This is because the threading context in faster-whisper affects Triton's. The function is called from an arbitrary thread in faster-whisper, or somewhere else, after the ModelClient communicates with the Triton client to obtain the model and other metadata(such as batch size) when closing the connection. Therefore, am I right in understanding that the current Triton version cannot handle a multi-thread-related library/framework?
This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
The issue with PyTriton's multi-thread support was resolved in release 0.5.1 through a temporary solution where only INFO messages are logged to avoid cluttering the log with warnings. Additionally, the underlying issue in the tritonclient has been addressed in its repository, promising a permanent fix in future PyTriton releases. While multi-threading remains supported in older PyTriton versions, users may experience warning messages.
Description
I got error which related to gevent when serving pytriton with faster-whisper 0.10.0. I found similar issue in triton but solutions what I found was not clear. https://github.com/triton-inference-server/pytriton/issues/56
To reproduce
If relevant, add a minimal example so that we can reproduce the error, if necessary, by running the code. For example:
Observed results and expected behavior
Please describe the observed results as well as the expected results. If possible, attach relevant log output to help analyze your problem. If an error is raised, please paste the full traceback of the exception.
Observed results when server up
Environment
Additional context Please refer to my dockerfile