tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
550 stars 165 forks source link

Bleu=30.38 not 23.04 #60

Closed Moshiii closed 4 years ago

Moshiii commented 4 years ago

Hi, Thanks for making your amazing work easy to reproduce first.

I am reproducing the model and I found the bleu score for java-large-test is 30.38. way better than the paper claimed 23.0 how do I reproduce the 23.04? am I doing something wrong here?

I used the common.sompute_bleu and configured the Perl script: image but I get a better score: image

Any hints on this, please?

urialon commented 4 years ago

Hi @Moshiii , Thank you for your interest in code2seq!

The 23.04 BLEU score is for the much smaller C# dataset, not for Java. In Java we measured only F1; in C# we measured only BLEU.

I hope it helps? Uri

Moshiii commented 4 years ago

Hi Urialon,

Thanks for replying! this solves the myth!

Just to share my bleu result on Java-small and Java-large:

image

the first bleu= 35.35 is from java-small dataset the second bleu = 30.38 is from java-large dataset.