Summarization ROGUE scores don't equal that of the paper ...

❓ Questions & Help

Just ran the run_summarization.py script, with the parameters specified here and the ROGUE scores are far off from what is reported in the related paper.

The ROGUE scores reported in PreSumm paper (R1, R2, RL):

BertSumExtAbs | 42.13 | 19.60 | 39.18

The ROGUE scores after running the HF script:

ROGUE 1: F1 = .275 Precision = .299 Recall = .260

ROGUE 2: F1 = .161 Precision = .184 Recall = .149

ROGUE L: F1 = .305 Precision = .326 Recall = .290

The README file seems to suggest that running the script as is, with all the stories in a single directory, will give you ROGUE scores similar to that of the paper. That doesn't seem the case.

Any ideas why? Or what I may be doing wrong here?

Thanks much!

FYI ... ran the script as in the README:

python run_summarization.py \
    --documents_dir $STORIES_DIR
    --summaries_output_dir $OUTPUT_SUM_DIR
    --no_cuda false \
    --batch_size 4 \
    --min_length 50 \
    --max_length 200 \
    --beam_size 5 \
    --alpha 0.95 \
    --block_trigram true \
    --compute_rouge true

huggingface / transformers

Summarization ROGUE scores don't equal that of the paper ... #2338

❓ Questions & Help