nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.29k stars 465 forks source link

Maximum Sentence Number in the Output #216

Open StevenLau6 opened 3 years ago

StevenLau6 commented 3 years ago

Thank you for sharing the code. I tested the extractive setting on a different summarization dataset and found there are at most 3 sentences output for each sample. It may meet the requirements of the CNN/DM dataset, but may not be suitable for other dataset, where the target summaries can be longer than 3 sentneces. So I suggest to modify the code in trainer_ext.py#L275, and use the hyper-parameter self.args.max_tgt_len to control the length of output sequence. https://github.com/nlpyang/PreSumm/blob/ce8dc017fbef7c12b1b4bd764f0c3d20911ead5e/src/models/trainer_ext.py#L275