A query_engine (i.e., BaseQueryEngine) built off the same source Document's as the rag_dataset
Optionally an LLM to be used as the judge (defaults to OpenAI gpt-4)
Returns:
Benchmark results for:
Context similarity
Correctness
Faithfulness
Relevancy
(Same metrics shown in the Dataset Card)
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
[x] New Pack
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
[x] Added new notebook (that tests end-to-end) (in main framework)
[x] I stared at the code and made sure it makes sense
Description
Adds a new pack:
RagEvaluatorPack
:Given:
rag_dataset
(i.e.,LabelledRagDataset
)query_engine
(i.e.,BaseQueryEngine
) built off the same sourceDocument
's as therag_dataset
LLM
to be used as the judge (defaults to OpenAI gpt-4)Returns: Benchmark results for:
(Same metrics shown in the Dataset Card)
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration