sebastianruder / NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
https://nlpprogress.com/
MIT License
22.72k stars 3.62k forks source link

Summarization metrics #104

Closed ellej16 closed 6 years ago

ellej16 commented 6 years ago

Hi guys! I've observed that researches featured for summarization mostly describes evaluating summaries only in the following metrics:

And recalling from past researches, I see they are the most often used.

Does anyone have an idea why these are favored over the other metrics? specifically:

Among others? (I mentioned RR because I used it previously along with CR)

Thanks for this great Repo btw!

ellej16 commented 6 years ago

Tagging summarization.md contributors , sorry for the bother! @jfsantos @shashiongithub @FredRodrigues @sebastianruder

sebastianruder commented 6 years ago

Hey @ellej16, could you be more explicit how Retention Ratio is used as a metric? In Hassel (2004), it is only defined as information in Summary / information in Full Text.

ellej16 commented 6 years ago

Hi @sebastianruder ! On a past research we used Answer Recall Average [mani 2002] to define information is in the summary, via answering certain questions based on the full text. A respondent is tasked beforehand to create the said Q&As on the full text (information in Full text)

sebastianruder commented 6 years ago

Cool. Given your description, it seems that Answer Recall Average is a lot more expensive to evaluate, particularly on a large scale, as you require human answers for every text. I think that's similar to human evaluation vs. BLEU in Machine Translation and arguably is the main reason why automatic metrics are preferred.

ellej16 commented 6 years ago

Thank you very much for your time and insight on this one!

Also, huge thanks for this repository (Glad to see summarization still having research interest!)