Closed yiqingxyq closed 1 year ago
Hi @Veronicium
Thanks for pointing this out! Sorry that we have been busy for quite a while and just get time to reproduce this issue.
Yes, your reproduced results are correct and this checkpoint might be wrongly selected. We've replaced it with another newly finetuned checkpoint that gives bleu = 84.32, em = 65.9.
Hi! I evaluated your finetuned checkpoint on java-cs translation but could not get the exactly same results as your paper reported. I got 83.89/64.7 but the paper reported 84.03/65.9. I read that you use beam-search w/o sampling to generate the results, which should not bring randomness, so I'm wondering where did the randomness come from.
This is my output:
I downloaded the checkpoint from here: (and I used translate_java_cs_codet5_base.bin)
Thank you!