Add support for the Dart language

devoncarew commented 3 months ago

Hi, I'm curious what the process is for adding support for a new language to the translator / benchmarks (specifically, support for Dart / dart.dev). I did find this: https://github.com/nuprl/MultiPL-E/blob/main/dataset_builder/README.md, but was wondering if there was more detailed information and / or a few good representative PRs.

arjunguha commented 3 months ago

Thanks for your interest. We'd love to include Dart. Based on what I know of Dart, I suggest adapting the TypeScript translator:

https://github.com/nuprl/MultiPL-E/blob/main/dataset_builder/humaneval_to_ts.py

Once you've written humaneval_to_dart.py, the simplest way to spot check it is using test.py in the same directory. This is an example on the simplest problem:

test.py  humaneval_to_ts ../datasets/originals/HumanEval_53_add.py

You'll get three outputs, separated with **** (1) the Dart prompt, (2) the Dart test suite, and (3) the list of stop tokens.

Before trying to benchmark, it is best to spot-check on a diverse set of problems that exercise different parts of the translator. These are the HumanEval problems that we use to spot-check:

Array: 0 and 1
Dictionaries: 95, 111
Nested array: 22, 87, 115, 129
There are no benchmarks with nested dictionaries
Check that None was properly translated: 136,162
Empty list: 29
Newlines in strings: 51
Single-quoted string: 51
Double-quoted string: 51
Empty string (single quotes): 51
Empty string (double quotes): 125
Integer overflow: 114
Tuple ellipsis: 148
Union type: 103, 125, 137
Any type: 22
Optional type: 12, 90, 128, 136, 162

The final part is writing a little execution script. I would adapt the TypeScript execution script as well:

https://github.com/nuprl/MultiPL-E/blob/main/evaluation/src/eval_ts.py

devoncarew commented 3 months ago

Thanks for the response! I'll look into the above and ping if I hit issues or make progress.

arjunguha commented 1 month ago

Some results that seem reasonable to me:

Dataset,Pass@k,Estimate,NumProblems,MinCompletions,MaxCompletions
dart_prompts-deepseekcoder_v2lite_base-0.2-reworded,1,0.25,43,50,50
dart_prompts-starcoder2_15b-0.2-reworded,1,0.35,157,50,50
dart_prompts-starcoderbase-0.2-reworded,1,0.18,157,50,50

I've also posted them here:

https://huggingface.co/spaces/nuprl/MultiPL-E

Next up is a release, including adding to https://github.com/bigcode-project/bigcode-evaluation-harness/

I'll get to that within a week or so.

Thanks for the PR!

nuprl / MultiPL-E

Add support for the Dart language #152