truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.16k stars 188 forks source link

[ISSUE] record.wait_for_feedback_results() with TruLlama not recording results #1638

Open paul-gleeson opened 1 week ago

paul-gleeson commented 1 week ago

Bug Description I'm using huggingface as the provider to generate feedback from a RAG model that uses TruLlama as the base of the feedback recorder. Even though I'm using _record.wait_for_feedbackresults(), I'm not seeing any feedback results from the responses of my RAG model. I'm following the same code structure that I used for a LLM response except in the instance I used LLMChain instead of TruLlama.

To Reproduce Here is my code:

# start a trusession
session = TruSession()
session.reset_database()

# Define evaluation metrics
f_pii_detection_input = Feedback(hugs.pii_detection).on_input()
f_pii_detection_output = Feedback(hugs.pii_detection).on_output()
f_toxicity = Feedback(hugs.toxic).on_input()
f_positive_sentiment = Feedback(hugs.positive_sentiment).on_output()

tru_query_engine_recorder = TruLlama(
    query_engine,
    app_name="LlamaIndex_App",
    app_version="base",
    feedbacks=[f_pii_detection_input, f_pii_detection_output, f_positive_sentiment, f_toxicity],
)

def interact_with_model(prompt_input):
    global tru_recorder

    with tru_query_engine_recorder as recording:
        current_timestamp = datetime.now()
        llm_response = query_engine.query(prompt_input)

        # Get the record &  extract feedback results
        record = recording.get()

        feedback_results_list = [] 

        for feedback, result in record.wait_for_feedback_results().items():
            feedback_results_list.append((feedback.name, result.result))
            print(feedback.name, result.result)

        # Extract feedback results

        pii_input_detected = feedback_results_list[0][1]
        pii_output_detected = feedback_results_list[1][1]
        positive_sentiment = feedback_results_list[2][1]
        toxicity = feedback_results_list[3][1]

        with open('llm_responses_eval.csv', mode='a', newline='') as file:
            writer = csv.writer(file)
            writer.writerow([current_timestamp, prompt_input, llm_response, pii_input_detected, pii_output_detected, positive_sentiment, toxicity])

    return llm_response

Relevant Logs/Tracebacks [('pii_detection', None), ('pii_detection', None), ('positive_sentiment', None), ('toxic', None)]

Environment:

paul-gleeson commented 2 days ago

Hi @sfc-gh-pdharmana , any idea on the above? thank you!