nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.28k stars 463 forks source link

Oracle and Lead3 Rouge scores on pre-processed data #133

Closed kent930 closed 4 years ago

kent930 commented 4 years ago

Hello,

When I compute the Rouge scores for the CNN/DailyMail dataset, using the pre-processed data downloaded through this repository I get different results from these in the paper. Especially for the RougeL metric.

From the pre-processed data: Model R1 R2 RL
Oracle 56.24 33.76 39.98
Lead-3 40.14 17.55 25.07
In the paper: Model R1 R2 RL
Oracle 52.59 31.24 48.87
Lead-3 40.42 17.62 36.67

Do you have an idea why I have such a difference? Thank you

Edit: Just find the mistake. Pyrouge requires to write summaries with one sentence per line, not all the summary on one line. Doing this fixes the RougeL mistmatch.

nakhunchumpolsathien commented 4 years ago

Hello, I would like to know what script did you use to evaluate LEAD and Oracle ROUGE score?

Thank you.

kent930 commented 4 years ago

This is the script I use: main.txt

nakhunchumpolsathien commented 4 years ago

@kent930 Could you please kindly share you updated script.

kent930 commented 4 years ago

Here is the updated script: https://pastebin.com/E20G4qyc