Closed devoncarew closed 1 month ago
Thanks for your interest. We'd love to include Dart. Based on what I know of Dart, I suggest adapting the TypeScript translator:
https://github.com/nuprl/MultiPL-E/blob/main/dataset_builder/humaneval_to_ts.py
Once you've written humaneval_to_dart.py
, the simplest way to spot check it is using test.py
in the same directory. This is an example on the simplest problem:
test.py humaneval_to_ts ../datasets/originals/HumanEval_53_add.py
You'll get three outputs, separated with ****
(1) the Dart prompt, (2) the Dart test suite, and (3) the list of stop tokens.
Before trying to benchmark, it is best to spot-check on a diverse set of problems that exercise different parts of the translator. These are the HumanEval problems that we use to spot-check:
The final part is writing a little execution script. I would adapt the TypeScript execution script as well:
https://github.com/nuprl/MultiPL-E/blob/main/evaluation/src/eval_ts.py
Thanks for the response! I'll look into the above and ping if I hit issues or make progress.
Some results that seem reasonable to me:
Dataset,Pass@k,Estimate,NumProblems,MinCompletions,MaxCompletions
dart_prompts-deepseekcoder_v2lite_base-0.2-reworded,1,0.25,43,50,50
dart_prompts-starcoder2_15b-0.2-reworded,1,0.35,157,50,50
dart_prompts-starcoderbase-0.2-reworded,1,0.18,157,50,50
I've also posted them here:
https://huggingface.co/spaces/nuprl/MultiPL-E
Next up is a release, including adding to https://github.com/bigcode-project/bigcode-evaluation-harness/
I'll get to that within a week or so.
Thanks for the PR!
Hi, I'm curious what the process is for adding support for a new language to the translator / benchmarks (specifically, support for Dart / dart.dev). I did find this: https://github.com/nuprl/MultiPL-E/blob/main/dataset_builder/README.md, but was wondering if there was more detailed information and / or a few good representative PRs.