Closed HitAgain closed 7 months ago
Hi, thanks for your kind words. Now, for e2e metrics, we check if the gold answer exists within the response (in the string). This is a very simple method, so I haven't provided the code. However, I will later update it with more complex e2e metrics evaluation such as rouge score.
Thanks for the author's contribution on RAG task! But I have a question that maybe we should have a E2E metric to evaluate in a view of the whole system (R + G), such as rouge score or something else could evaluate the similarity between predict answer and ground truth answer?