xing-hu / EMSE-DeepCom

The dataset for EMSE-DeepCom
MIT License
118 stars 63 forks source link

Why the corpus_bleu is 0.0001? I have trained the model. #23

Open satinewee opened 3 years ago

satinewee commented 3 years ago

decaying learning rate to :0.061 decaying learning rate to :0.058 step 46000 epoch 43 learning rate 0.058 step-time 3.065 loss 0.764 test eval loss:123.05 start decoding corpus_bleu:0.0001 avg_score:0.2200

And where is the finally output? Is it in the model/eval/test.46000.out? Thank you.

satinewee commented 3 years ago

您好,很感谢您回复我的问题,猜测您应该也是中国人,所以我就使用中文了,害怕英文表达不好我的意思。

1、附件中是我的代码及数据集,config.yaml在source code文件夹下,dataset中是数据集,model中是模型运行的输出。 2、我使用该代码跑自己的数据集时(即本附件中的数据集,该数据集是[1]中的数据集,但据该论文所说,这篇数据集也是从您[2]的论文数据集中获取的),每次输出corpus bleu都是0,并且最终生成的test.xxx.out的bleu评分只有0.02. 3、此外,我用该代码跑deepcom本身的数据集时,每次输出corpus bleu也是0或0.0001,不知道是否是代码中存在什么问题? 4、或者是否是因为当前这份数据集总体长度是deepcom数据集的两倍,所以模型无法较好的为其生成结果?

如果能有幸再次获得您的回复,我感激不尽!

[1]Jian Zhang,Xu Wang,Hongyu Zhang,et al,Retrieval-based Neural Source Code Summarization. [2] Xing Hu, Ge Li, Xin Xia,et al. 2018. Summarizing source code with transferred API knowledge.

------------------ 原始邮件 ------------------ 发件人: "xing-hu/EMSE-DeepCom" <notifications@github.com>; 发送时间: 2021年2月18日(星期四) 下午2:41 收件人: "xing-hu/EMSE-DeepCom"<EMSE-DeepCom@noreply.github.com>; 抄送: "3248777"<3248777@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [xing-hu/EMSE-DeepCom] Why the corpus_bleu is 0.0001? I have trained the model. (#23)

can you share your code?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

xing-hu commented 3 years ago

The final corpus bleu score is computed by multi-bleu.perl instead of nltk. Is the score computed by multi-bleu.perl is 0.000 too?

satinewee commented 3 years ago

您好,感谢您的回复,我按您所说重新计算得到以下结果。 BLEU=3.63 20.8/5.4/3.0/2.2 (BP=0.698,ratio=0.736,hyp_len=80639,ref_len=109631) 需要说明的是,我在此处使用的数据集是[1]论文的数据集,但据论文所述该数据集是从您[2]的论文中所获取的,请问是否是因为我所使用的数据集过长(平均长度是deepcom数据集的两倍)所导致的效果不好呢?还是存在其他方面问题? 如果能再次得到您的回复,我将感激不尽。 [1] Jian Zhang,Retrieval-based Neural Source Code Summarization [2]  Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizingsource code with transferred API knowledge. 

------------------ Original ------------------ From: Xing Hu <notifications@github.com> Date: Thu,Feb 18,2021 4:07 PM To: xing-hu/EMSE-DeepCom <EMSE-DeepCom@noreply.github.com> Cc: satinewee <3248777@qq.com>, Author <author@noreply.github.com> Subject: Re: [xing-hu/EMSE-DeepCom] Why the corpus_bleu is 0.0001? I have trained the model. (#23)

The final corpus bleu score is computed by multi-bleu.perl instead of nltk. Is the score computed by multi-bleu.perl is 0.000 too?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

satinewee commented 3 years ago

抱歉,此为上一篇邮件的附件,非常感谢您。

------------------ 原始邮件 ------------------ 发件人: "xing-hu/EMSE-DeepCom" <notifications@github.com>; 发送时间: 2021年2月18日(星期四) 下午2:41 收件人: "xing-hu/EMSE-DeepCom"<EMSE-DeepCom@noreply.github.com>; 抄送: "3248777"<3248777@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [xing-hu/EMSE-DeepCom] Why the corpus_bleu is 0.0001? I have trained the model. (#23)

can you share your code?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

xing-hu commented 3 years ago

长度有可能导致这个问题,但应该不会差这么多,能否把你现在用的数据集邮箱发我看一下(xinghu@zju.edu.cn)?