microsoft / rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
https://github.com/microsoft/rag-experiment-accelerator
Other
174 stars 58 forks source link

Add promptflow-evals quality metrics as an alternative to ragas #707

Open prvenk opened 6 days ago

prvenk commented 6 days ago

We currently leverage some llm based evaluation metrics from ragas: https://github.com/explodinggradients/ragas namely, llm_context_precision, llm_context_recall and llm_answer_relevance in this function compute_llm_based_score. These are the RAG triad of metrics.

For rag usecases, however we have an alternative llm-as-a-judge framework provided by promptflow-evals (supported by Microsoft and part of promptflow): https://pypi.org/project/promptflow-evals/

This evaluation framework has quality metrics such as relevance that can be leveraged for answer relevance or context precision. It has a targeted prompt for groundedness. promptflow-evals also has other quality metrics such as coherence, style, fluency, similarity. Moreover, the package also can enable inclusion of safety metrics such hate unfairness, violence, sexual among others.

Ideally, this can serve as a replacement for ragas metrics, but we can integrate promptflow-evals first and make a decision about removing ragas in a subsequent issue given many might be using ragas metrics.

prvenk commented 5 days ago

@guybartal @ritesh-modi