Open tascheidt opened 10 months ago
For Tru, need to set up functions for each metric:
from trulens_eval import OpenAI as fOpenAI provider = fOpenAI()
Answer relevance: from trulens_eval import Feedback f_qa_relevance = Feedback( provider.relevance_with_cot_reasons, name="Answer Relevance" ).on_input_output()
Context relevance:
from trulens_eval import TruLlama
context_selection = TruLlama.select_source_nodes().node.text
f_qs_relevance = ( Feedback(provider.qs_relevance_with_cot_reasons, name="Context Relevance") .on_input() .on(context_selection) .aggregate(np.mean) )
Groundedness:
grounded = Groundedness(groundedness_provider=provider) f_groundedness = ( Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness" ) .on(context_selection) .on_output() .aggregate(grounded.grounded_statements_aggregator) )
Reporting from https://learn.deeplearning.ai/building-evaluating-advanced-rag/lesson/3/rag-triad-of-metrics:
from trulens_eval import TruLlama from trulens_eval import FeedbackMode
tru_recorder = TruLlama( sentence_window_engine, app_id="App_1", feedbacks=[ f_qa_relevance, f_qs_relevance, f_groundedness ] )
eval_questions = [] with open('eval_questions.txt', 'r') as file: for line in file:
item = line.strip()
eval_questions.append(item)
for question in eval_questions: with tru_recorder as recording: sentence_window_engine.query(question)
records, feedback = tru.get_records_and_feedback(app_ids=[]) records.head()
tru.get_leaderboard(app_ids=[])
tru.run_dashboard()
Can try with Phase LLM or Trulens per the DeepLearning.AI course
Triad of metrics is Query, Response, and Context