When I run Test, the resulting file has 10-15% less samples than the number of samples in the input (number of each samples in each file created by format_to_bert). I am trying to figure out possible causes of this.
I see that one possible cause is that the source length is zero
When I run Test, the resulting file has 10-15% less samples than the number of samples in the input (number of each samples in each file created by
format_to_bert
). I am trying to figure out possible causes of this.I see that one possible cause is that the source length is zero
https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L270
However, I do not have these cases, so I am wondering where else are the samples getting lost.