Closed prasad4fun closed 7 months ago
Hey @prasad4fun ! Definitely. Check out these docs for running on existing data
Hey @prasad4fun ! Definitely. Check out these docs for running on existing data
I already have the questions and the context retrieved by RAG. Can I directly use Trulens for evaluation? Where can I find the documentation? I see that the document link you provided above is no longer valid.
Hi @kmr666 - those docs have moved here
Hi @kmr666 - those docs have moved here
Hello, thank you very much for your reply and help. I have another question to consult you with. I already have a database with multiple entries of question, context, and answer. However, my context is given in the form of a nested array, such as [[], [], []). Do I need to expand the content in the context into a list containing a string? Or is there another method? Could you tell me how to input my own data? Thank you very much.
Not sure if I am getting your data right, please let me know and share a sample data frame if not.
You can load data like this:
import pandas as pd
data = {
'prompt': ['Where is Germany?', 'What is the capital of France?'],
'response': ['Germany is in Europe', 'The capital of France is Paris'],
'context': [['Germany is a country located in Europe',
'Germany lies between the Baltic and North Sea to the North and Alps to the South'], ['France is a country in Europe and its capital is Paris.',
'Paris is the capital of France']]
}
df = pd.DataFrame(data)
Into TruLens virtual records like this:
from trulens_eval.tru_virtual import VirtualRecord
data_dict = df.to_dict('records')
data = []
for record in data_dict:
rec = VirtualRecord(
main_input=record['prompt'],
main_output=record['response'],
calls=
{
context_call: dict(
args=[record['prompt']],
rets=record['context']
)
}
)
data.append(rec)
from trulens_eval.tru_virtual import TruVirtual
virtual_recorder = TruVirtual(
app_id="a virtual app",
app=virtual_app)
for record in data:
virtual_recorder.add_record(record)
Once you've done this, you can then see your app represented in the TruLens dashboard.
And see both retrieved context if you click on the get_context
span.
The beauty of this approach is that it can be set up to mirror any arbitrary app structure. Let me know if you have more questions on this.
Not sure if I am getting your data right, please let me know and share a sample data frame if not.
You can load data like this:
import pandas as pd data = { 'prompt': ['Where is Germany?', 'What is the capital of France?'], 'response': ['Germany is in Europe', 'The capital of France is Paris'], 'context': [['Germany is a country located in Europe', 'Germany lies between the Baltic and North Sea to the North and Alps to the South'], ['France is a country in Europe and its capital is Paris.', 'Paris is the capital of France']] } df = pd.DataFrame(data)
Into TruLens virtual records like this:
from trulens_eval.tru_virtual import VirtualRecord data_dict = df.to_dict('records') data = [] for record in data_dict: rec = VirtualRecord( main_input=record['prompt'], main_output=record['response'], calls= { context_call: dict( args=[record['prompt']], rets=record['context'] ) } ) data.append(rec) from trulens_eval.tru_virtual import TruVirtual virtual_recorder = TruVirtual( app_id="a virtual app", app=virtual_app) for record in data: virtual_recorder.add_record(record)
Once you've done this, you can then see your app represented in the TruLens dashboard.
And see both retrieved context if you click on the
get_context
span.The beauty of this approach is that it can be set up to mirror any arbitrary app structure. Let me know if you have more questions on this.
Hello, thank you very much for your answers and help. My data is stored in a CSV file with three columns: question, context, and answer. The content is in Chinese text, and the data format is as follows:
data_samples = { 'question': ['xxx', 'xxx'], 'response': ['xxx', 'xxx'], 'contexts': [['xxx'], ['xxx']], } Here is a data instance as follows:
Can you tell me if it's possible to use TruLens to evaluate an existing dataset of this type? Thank you very much, I look forward to your reply!
TruLens would be a good fit for this because of its wide model support. I would recommend exploring LLMs to use as providers that have good Chinese language support.
If you encounter issues because of the prompting is in English, you can also use custom feedback functions and translate the feedback prompts to Chinese.
TruLens would be a good fit for this because of its wide model support. I would recommend exploring LLMs to use as providers that have good Chinese language support.
If you encounter issues because of the prompting is in English, you can also use custom feedback functions and translate the feedback prompts to Chinese.
Thank you for your help. I have another problem that I hope can get your help. When I was evaluating my own data, the background could not display Trace details, and I could not see the corresponding context information added. My code is as follows:
for record in data_dict:
rec = VirtualRecord(
main_input=record['prompt'],
main_output=record['response'],
calls=
{
context_call: dict(
args=[record['prompt']],
rets=record['context']
)
}
)
data.append(rec)
context = context_call.rets[:]
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(fopenai.context_relevance)
.on_input()
.on(context)
)
from trulens_eval.tru_virtual import TruVirtual
virtual_recorder = TruVirtual(
app_id="a virtual app",
app=virtual_app,
feedbacks=[f_context_relevance]
)
for rec in data:
virtual_recorder.add_record(rec)
# Retrieve feedback results. You can either browse the dashboard or retrieve the
# results from the record after it has been `add_record`ed.
for rec in data:
print(rec.main_input, "-->", rec.main_output, )
for feedback, feedback_result in rec.wait_for_feedback_results().items():
print("\t", feedback.name, feedback_result.result)
print('finish')
response in my own dataset is currently null. How can I see the full data, including Trace details, as you have shown above?
I have used Mixtral and mistral models for 1000 sampled questions. using RAG pipeline from Llama index have retrieved contexts and generated answer for these 1k questions. Is it possible to evaluate RAG Triage metric for this existing data frame of questions context and answer columns using GPT 3.5 as an evaluator?