huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.09k stars 26.31k forks source link

Summarization ROGUE scores don't equal that of the paper ... #2338

Closed ohmeow closed 4 years ago

ohmeow commented 4 years ago

❓ Questions & Help

Just ran the run_summarization.py script, with the parameters specified here and the ROGUE scores are far off from what is reported in the related paper.

The ROGUE scores reported in PreSumm paper (R1, R2, RL):

BertSumExtAbs | 42.13 | 19.60 | 39.18

The ROGUE scores after running the HF script:

ROGUE 1: F1 = .275 Precision = .299 Recall = .260

ROGUE 2: F1 = .161 Precision = .184 Recall = .149

ROGUE L: F1 = .305 Precision = .326 Recall = .290

The README file seems to suggest that running the script as is, with all the stories in a single directory, will give you ROGUE scores similar to that of the paper. That doesn't seem the case.

Any ideas why? Or what I may be doing wrong here?

Thanks much!

FYI ... ran the script as in the README:

python run_summarization.py \
    --documents_dir $STORIES_DIR
    --summaries_output_dir $OUTPUT_SUM_DIR
    --no_cuda false \
    --batch_size 4 \
    --min_length 50 \
    --max_length 200 \
    --beam_size 5 \
    --alpha 0.95 \
    --block_trigram true \
    --compute_rouge true
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.