truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
1.84k stars 163 forks source link

[BUG] App Freezes after Crash on Another Thread followed by `app.wait_for_feedback_results()` #1205

Open cabouezzi opened 3 weeks ago

cabouezzi commented 3 weeks ago

Bug Description Using the vanilla relevance, context relevance, and groundedness feedback functions. Locally run custom app using Ollama as both responder and provider/judge. Firstly, there are errors occurring on separate threads (see traceback). The app doesn't crash, however app.wait_for_feedback_results() freezes the program if a thread crashes.

To Reproduce Purposely crash one of the threads to see the freezing behavior. For the exact TimeOutError, perhaps use a large LLM that will take >10 minutes for responding. (Reasoning being it will occasionally say timed out after 600s, unable to reproduce however.)

Expected behavior A clear and concise description of what you expected to happen.

Relevant Logs/Tracebacks

Traceback (most recent call last):
  File "/Path/MyProject/compare.py", line 30, in <module>
  File "/Path/MyProject/Evaluator.py", line 129, in run_triad
  File "/Path/MyProject/env/lib/python3.11/site-packages/trulens_eval/app.py", line 627, in wait_for_feedback_results
  File "/Path/MyProject/env/lib/python3.11/site-packages/trulens_eval/schema/record.py", line 186, in wait_for_feedback_results
  File "/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result
  File "/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
  File "/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
  File "/Path/MyProject/env/lib/python3.11/site-packages/trulens_eval/utils/python.py", line 475, in _future_target_wrapper
  File "/Path/MyProject/env/lib/python3.11/site-packages/trulens_eval/utils/threading.py", line 168, in _run_with_timeout
  File "/Path/MyProject/env/lib/python3.11/site-packages/trulens_eval/utils/threading.py", line 159, in _run_with_timeout
  File "/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 458, in result
TimeoutError

(App continues to run after this error, but freezes)

Environment:

Additional context Custom app being used

dosubot[bot] commented 3 weeks ago

Hey @cabouezzi!

I'm here to help you out with any bugs, questions, and contributions you have in mind. Let's squash those bugs together while we wait for a human maintainer to join in.

To prevent the app from freezing when app.wait_for_feedback_results() is called after a thread crashes, you can modify the _manage_pending_feedback_results method to handle exceptions that might occur within the thread. This way, the thread can continue processing other records even if one of them causes an error.

Here is an example of how you can modify the _manage_pending_feedback_results method to catch and log exceptions:

def _manage_pending_feedback_results(self) -> None:
    """Manage the queue of records with pending feedback results.

    This is meant to be run permanently in a separate thread. It will
    remove records from the queue records_with_pending_feedback_results as
    their feedback results are computed and makes sure the queue does not
    keep growing.
    """
    while True:
        try:
            record = self.records_with_pending_feedback_results.get()
            record.wait_for_feedback_results()
        except Exception as e:
            logger.error(f"Error processing feedback results: {e}")
            # Optionally, you can add more error handling logic here

By wrapping the record.wait_for_feedback_results() call in a try-except block, you ensure that any exceptions are caught and logged, preventing the thread from crashing and allowing it to continue processing other records [1].

To continue talking to Dosu, mention @dosu.

sfc-gh-pmardziel commented 1 week ago

Hi @cabouezzi , I just added some functionality to App.wait_for_feedback_results that might address this issue. There is now a default timeout. Check you can check it out before the next release by installing the branch in this PR: https://github.com/truera/trulens/pull/1267 .