nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
201 stars 38 forks source link

Add HumanEval+ test generation #78

Closed Randl closed 1 year ago

Randl commented 1 year ago

Added code to generate MultiPL-E tests from HumanEval+. To generate tests, you must run the generate_data.py script, which generates Python files in the dataset folder. Then, all_prepare_prompts.py generates the tests. I haven't checked whether there are any new problems with translation, but it seemed to work ok.

closes https://github.com/nuprl/MultiPL-E/issues/62

Randl commented 1 year ago

I also didn't use different variations, but it can be done with the same code

arjunguha commented 1 year ago

In MultiPL-E, we deliberately changed some prompts to make them more amenable to translation. For example, we added type annotations wherever they were missing. Eyeballing the json file, I don't think that's been done. Are you sure you don't want to do that here as well.

The way things are right now, a fair number of problems will fail to translate to typed languages.

Randl commented 1 year ago

I'm using the prompts from MultiPL-E. The first scripts takes path to MultiPL-E HumanEval folder and uses it:

https://github.com/Randl/MultiPL-E/blob/1679924c2c5c4334fc90b1ef636a8d4d2479ebc0/humaneval_plus/generate_data.py#L187-L194

so that should be ok unless I miss something?

I didn't want to change original JSON to make updates easier.

arjunguha commented 1 year ago

Ah, got it. I read the json but not the code. Thanks!