Reasoning behind rouge metric

sundi133 / rag-eval

Automated extraction [ET] & generation of high quality dataset based on your documents (pdf, csv, json, text files etc) to evaluate any LLM app endpoints

Apache License 2.0

10 stars 2 forks source link

Reasoning behind rouge metric #63

Open ogencoglu opened 5 months ago

ogencoglu commented 5 months ago

rouge is truly a flawed and limited metric as it simply compares n-grams. It can not capture the actual semantics. Any comments on that?

sundi133 commented 5 months ago

Indeed, that's an insightful observation. To enhance the metric's capability for general reasoning, I have implemented an extension in the evaluation script. You can find the update at this link: https://github.com/sundi133/rag-eval/blob/main/src/evaluation.py#L84. This modification includes the use of a language learning model (LLM) for evaluation purposes. let me know your thoughts on this approach.