Hi, we reproduced the performance of the raw GPT-Neo (125M and 1.3B) on the HumanEval dataset, and found that it was much lower than that reported in the Codex paper. Do you have any plans to publish the raw GPT-Neo on HumanEval? In addition, are there any tricks in the process of reproducing this? Thanks!
Hi, we reproduced the performance of the raw GPT-Neo (125M and 1.3B) on the HumanEval dataset, and found that it was much lower than that reported in the Codex paper. Do you have any plans to publish the raw GPT-Neo on HumanEval? In addition, are there any tricks in the process of reproducing this? Thanks!
Our re-produce results:
The official reported results:
Looking forward to your reply!