Closed Arseniy-II closed 3 months ago
To address the outdated documentation for Groundedness, we need to update it to reflect the recent changes introduced in the pull request Updated Groundedness Usage. Below is the updated documentation content that should replace the old references:
The Groundedness functionality evaluates the consistency of summarized responses with the source texts. This is done using various feedback function providers such as OpenAI's GPT-3.5-turbo, GPT-4, and Huggingface.
The recent update introduces new methods and classes for Groundedness. Here is how you can use the updated Groundedness functionality:
from trulens_eval.feedback import GroundTruthAgreement, Groundedness
from trulens_eval import TruBasicApp, Feedback, Tru, Select
from test_cases import generate_summeval_groundedness_golden_set
Tru().reset_database()
test_cases_gen = generate_summeval_groundedness_golden_set("./datasets/summeval_test_100.json")
groundedness_golden_set = [next(test_cases_gen) for _ in range(100)]
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["HUGGINGFACE_API_KEY"] = "..."
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
groundedness_hug = Groundedness(groundedness_provider=huggingface_provider)
f_groundedness_hug = Feedback(groundedness_hug.groundedness_measure, name="Groundedness Huggingface").on_input().on_output().aggregate(groundedness_hug.grounded_statements_aggregator)
def wrapped_groundedness_hug(input, output):
return np.mean(list(f_groundedness_hug(input, output)[0].values()))
from trulens_eval.feedback.provider import OpenAI
groundedness_openai = Groundedness(groundedness_provider=OpenAI(model_engine="gpt-3.5-turbo"))
f_groundedness_openai = Feedback(groundedness_openai.groundedness_measure, name="Groundedness OpenAI GPT-3.5").on_input().on_output().aggregate(groundedness_openai.grounded_statements_aggregator)
def wrapped_groundedness_openai(input, output):
return f_groundedness_openai(input, output)[0]['full_doc_score']
groundedness_openai_gpt4 = Groundedness(groundedness_provider=OpenAI(model_engine="gpt-4"))
f_groundedness_openai_gpt4 = Feedback(groundedness_openai_gpt4.groundedness_measure, name="Groundedness OpenAI GPT-4").on_input().on_output().aggregate(groundedness_openai_gpt4.grounded_statements_aggregator)
def wrapped_groundedness_openai_gpt4(input, output):
return f_groundedness_openai_gpt4(input, output)[0]['full_doc_score']
ground_truth = GroundTruthAgreement(groundedness_golden_set)
f_mae = Feedback(ground_truth.mae, name="Mean Absolute Error").on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()
tru_wrapped_groundedness_hug = TruBasicApp(wrapped_groundedness_hug, app_id="groundedness huggingface", feedbacks=[f_mae])
tru_wrapped_groundedness_openai = TruBasicApp(wrapped_groundedness_openai, app_id="groundedness openai gpt-3.5", feedbacks=[f_mae])
tru_wrapped_groundedness_openai_gpt4 = TruBasicApp(wrapped_groundedness_openai_gpt4, app_id="groundedness openai gpt-4", feedbacks=[f_mae])
for i in range(len(groundedness_golden_set)):
source = groundedness_golden_set[i]["query"]
response = groundedness_golden_set[i]["response"]
with tru_wrapped_groundedness_hug as recording:
tru_wrapped_groundedness_hug.app(source, response)
with tru_wrapped_groundedness_openai as recording:
tru_wrapped_groundedness_openai.app(source, response)
with tru_wrapped_groundedness_openai_gpt4 as recording:
tru_wrapped_groundedness_openai_gpt4.app(source, response)
Tru().get_leaderboard(app_ids=[]).sort_values(by="Mean Absolute Error")
The trulens_eval/feedback/prompts.py
file contains various templates and prompts used for evaluating groundedness and other feedback metrics. These include:
These templates and prompts are part of the feedback mechanisms that help in assessing the groundedness and other quality metrics of the responses.
This updated documentation should replace the old references to ensure users are guided correctly on how to use the new Groundedness functionality.
To continue the conversation, mention @dosu.
Bug Description Here is the reference for the old use of Groundedness
Expected behavior According to the recent updated it should be used differently