openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.31k stars 330 forks source link

Fix Mistakes in the Dataset #23

Open marcusm117 opened 1 year ago

marcusm117 commented 1 year ago

Dear HumanEval Maintainers,

Thank you so much for sharing this awesome Test Set!

I fully understand that due to the nature of a Test Set, we want to keep it unchanged as much as possible. However, during our usage, a few mistakes were found in some prompts, canonical solutions, and test cases. (some were also raised in previous issues https://github.com/openai/human-eval/issues).

These mistakes indeed affect the ability of HumanEval to accurately reflect the performance of a Code Generation Model. Therefore, here I'd love to propose an enhanced version of HumanEval, which fixes these known mistakes.

The changes made to the original repo:

  1. Add file human-eval-enhanced-202307.jsonl.gz to folder \data. This file is the compressed fixed dataset including the following 14 changes. Details about the mistakes and changes are documented in another file tests.py in the folder \data.
  1. Add file tests.py in the folder \data. This file includes tests for the changes in human-eval-enhanced-202307.jsonl, and also details about the mistakes in the original data set human-eval-v2-20210705.jsonl. The tests can be run as a Script, using the Command python tests.py, or they can be run by pytest, following the detailed instructions at the top of tests.py.

  2. Add file .gitignore to the root directory. This file includes common files to ignore when building a Python project, especially .pytest_cache and __pycache__ since tests.py can be run by pytest. This ".gitignore" file is not really important and can be optionally removed from this PR.

Thanks for your time reviewing this PR. Any feedback would be much appreciated : )

[UPDATE] So sorry for not using compressed files to avoid data leakage in the first place, it's an honest mistake. It's fixed now in this PR and there'll be no leakage after it's Squash-and-Merged. However, uncompressed files are still in some other closed accidental PR history. I can reach out to GitHub support to delete them if necessary.

Sincerely,

marcusm117

kolergy commented 7 months ago

Those fixes makes a lot of sense, I'm surprised it was not merged