openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.31k stars 330 forks source link

Re-produce raw GPT-Neo with 125M and 1.3B on this human-eval dataset #8

Closed BitcoinNLPer closed 2 years ago

BitcoinNLPer commented 2 years ago

Hi, we reproduced the performance of the raw GPT-Neo (125M and 1.3B) on the HumanEval dataset, and found that it was much lower than that reported in the Codex paper. Do you have any plans to publish the raw GPT-Neo on HumanEval? In addition, are there any tricks in the process of reproducing this? Thanks!

Our re-produce results: image

The official reported results: image

Looking forward to your reply!

minhngh commented 1 year ago

Hi BitcoinNLPer, Have you fixed this problem? If yes, would it be possible to explain problems and give the solution?

Thank you so much.