nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

New LeetCode dataset #154

Closed cassanof closed 2 months ago

cassanof commented 2 months ago

As HumanEval gets saturated, we should start using LeetCode as a source dataset. The current leetcode datasets were sourced from a random HF repo and most problems are wrong. The one in this PR comes from the DeepSeekCoder paper, it's not squeaky clean either, but it's definitely much better.

arjunguha commented 2 months ago

Should we stop uploading these as plain code, and instead upload jsonl files? idk

cassanof commented 2 months ago

Should we stop uploading these as plain code, and instead upload jsonl files? idk

hmm i think plain code is nice because you can easily inspect it; especially since there is a canonical solution to look at