Open austinmw opened 5 months ago
🤖
Based on the information provided in the context, it seems that the LabelledRagDataset
can be used with RagEvaluatorPack
as shown in the code snippet. However, the RetrieverEvaluator
or BatchEvalRunner
are not mentioned in the context, so it's unclear how they can be used with LabelledRagDataset
.
As for customizing the llm, embedding model, prompts, and adding hit rate and MRR metrics, the context does not provide enough information to give a complete answer. It's possible that these customizations would require modifications to the RagEvaluatorPack
or the creation of a custom evaluator.
Here is the code snippet from the context that shows how to use LabelledRagDataset
with RagEvaluatorPack
:
from llama_index.llama_pack import download_llama_pack
RagEvaluatorPack = download_llama_pack("RagEvaluatorPack", "./pack")
rag_evaluator = RagEvaluatorPack(
query_engine=query_engine, # built with the same source Documents as the rag_dataset
rag_dataset=rag_dataset,
)
benchmark_df = await rag_evaluator.run()
The query_engine
should be built with the same source Documents as the rag_dataset
. The benchmark_df
contains the mean scores for evaluation measures such as Correctness
, Relevancy
, Faithfulness
and Context Similarity
.
You can find more information in the evaluating with LlamaDatasets guide.
If you could provide more details about your use case or the specific customizations you're trying to make, I might be able to give a more precise answer.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
It looks like:
BaseRetrievalEvaluator.aevaluate_dataset
expects a EmbeddingQAFinetuneDataset
dataset objectBatchEvalRunner.aevaluate_responses
expects both queries
and responses
listsRagEvaluatorPack
expects a LabelledRagDataset
So currently we have three different evaluation mechanisms that require three different dataset formats. It seems LabelledRagDataset
is a newer abstraction, and could be used as a single dataset format across all of these, so multiple different formats of essentially the same information is no longer needed?
Bump 😅
Hi, @austinmw,
I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you were inquiring about using a LabelledRagDataset
with RetrieverEvaluator
or BatchEvalRunner
and whether they are compatible. There was a discussion about the documentation mentioning the use of LabelledRagDataset
with RagEvaluatorPack
and the need for more customization. It was suggested that customizations might require modifications to the RagEvaluatorPack
or the creation of a custom evaluator. You later pointed out that there are currently three different evaluation mechanisms that require three different dataset formats and suggested that LabelledRagDataset
could be used as a single dataset format across all of these.
Is this issue still relevant to the latest version of the LlamaIndex repository? If so, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contributions to the LlamaIndex project.
Dosu
not stale.
Question Validation
Question
Hi, I have a
LabelledRagDataset
created withRagDatasetGenerator
. Now how can I useRetrieverEvaluator
orBatchEvalRunner
with this? Are they compatible?The documentation only mentions using it with a
RagEvaluatorPack
which is not customizable enough. My goal is to measure Hit Rate, MRR, Context Relevance, and Faithfulness, using Bedrock LLMs.