nlpyang / BertSum

Code for paper Fine-tune BERT for Extractive Summarization
Apache License 2.0
1.47k stars 423 forks source link

Testing results in 10-15% less summaries than input. What are some possible causes for this? #83

Open Santosh-Gupta opened 5 years ago

Santosh-Gupta commented 5 years ago

When I run Test, the resulting file has 10-15% less samples than the number of samples in the input (number of each samples in each file created by format_to_bert). I am trying to figure out possible causes of this.

I see that one possible cause is that the source length is zero

https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L270

However, I do not have these cases, so I am wondering where else are the samples getting lost.