yixuantt / MultiHop-RAG

Repository for "MultiHop-RAG: A Dataset for Evaluating Retrieval-Augmented Generation Across Documents" (COLM 2024)
214 stars 15 forks source link

End2End Metric Need #5

Closed HitAgain closed 7 months ago

HitAgain commented 7 months ago

Thanks for the author's contribution on RAG task! But I have a question that maybe we should have a E2E metric to evaluate in a view of the whole system (R + G), such as rouge score or something else could evaluate the similarity between predict answer and ground truth answer?

yixuantt commented 7 months ago

Hi, thanks for your kind words. Now, for e2e metrics, we check if the gold answer exists within the response (in the string). This is a very simple method, so I haven't provided the code. However, I will later update it with more complex e2e metrics evaluation such as rouge score.