nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.29k stars 465 forks source link

BertAbs and BertExtAbs inference results #204

Open cfy-coder opened 3 years ago

cfy-coder commented 3 years ago

Hello~thanks for your sharing. I read the paper and follow the hyperparameter referring in the paper to train the BertExt, BertAbs and BertExtAbs model with CNN/Daily as dataset. All these model were trained and validate in validate dataset and test in test dataset. The Abs task were trained with 300000 steps. If use Ext task as pretrain task, the Ext was trained with 100000 steps.

However, there were some problems with the inference results. BertExt: the Rouge-F and Rouge-R get similar result with the paper. BertAbs and BertExtAbs: Rouge result are about 2.xx or 3.xx, which have a large gap with the paper. Then I check the result, and find that all the result with "Abs" as final task getting the same text result, like "new : . s . . . . . . . . . . . . . . . . . . ." or "new : . s . s . s . s . s . s . s . s . s . s". This is not a normal sentence at all.

I hope you can help me analyze the reasons for this situation. Thanks~

zhehengluoK commented 2 years ago

got the the same problem, did you look into the training process? I've noticed that in some point the loss just suddenly surge without a reason. https://github.com/nlpyang/PreSumm/issues/44#issue-500024423 The author said it could be one-gpu training here.

cfy-coder commented 2 years ago

亲的邮件已经收到了,我会尽快阅读并回复的。如果有急事,可以和我电话联系,O(∩_∩)O谢谢~~

wbchief commented 2 years ago

Hello~thanks for your sharing. I read the paper and follow the hyperparameter referring in the paper to train the BertExt, BertAbs and BertExtAbs model with CNN/Daily as dataset. All these model were trained and validate in validate dataset and test in test dataset. The Abs task were trained with 300000 steps. If use Ext task as pretrain task, the Ext was trained with 100000 steps.

However, there were some problems with the inference results. BertExt: the Rouge-F and Rouge-R get similar result with the paper. BertAbs and BertExtAbs: Rouge result are about 2.xx or 3.xx, which have a large gap with the paper. Then I check the result, and find that all the result with "Abs" as final task getting the same text result, like "new : . s . . . . . . . . . . . . . . . . . . ." or "new : . s . s . s . s . s . s . s . s . s . s". This is not a normal sentence at all.

I hope you can help me analyze the reasons for this situation. Thanks~

Using a single gpu generation of sentence results, do you solve the problem?

cfy-coder commented 2 years ago

亲的邮件已经收到了,我会尽快阅读并回复的。如果有急事,可以和我电话联系,O(∩_∩)O谢谢~~