Thanks for the great work and releasing the data/code.
I tried to replicate the results using the generated codes/test cases of CodeGen-16b on the MBPP benchmark. My pass@10/100 are quite similar to your reported results but pass@1 is quite far behind when using CodeGen-16b only (the result when using CodeT is similar to the reported number).
Model
pass@1
CodeGen-16b
reported
42.4%
CodeGen-16b
replicate
31.32%
CodeGen-16b + CodeT
reported
49.5%
CodeGen-16b + CodeT
replicate
49.58%
I wonder whether you used a different generation setting for pass@1 (e.g. different. temperatures) than for pass@10/100? Or there were a typo in the reported number.
Hi,
Thanks for the great work and releasing the data/code.
I tried to replicate the results using the generated codes/test cases of CodeGen-16b on the MBPP benchmark. My pass@10/100 are quite similar to your reported results but pass@1 is quite far behind when using CodeGen-16b only (the result when using CodeT is similar to the reported number).
I wonder whether you used a different generation setting for pass@1 (e.g. different. temperatures) than for pass@10/100? Or there were a typo in the reported number.