nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

autochatmodel.py: a way to ignore prompts? #156

Closed arjunguha closed 1 month ago

arjunguha commented 2 months ago

@cassanof: I'm using autochatmodel.py to evaluate gpt4o-mini. The array of completions that it builds includes the prompt. Our evaluation code concatenates the prompt and completion together:

https://github.com/nuprl/MultiPL-E/blob/main/evaluation/src/main.py#L34

Should I add a flag to the evaluation container to make it not concatenate prompt + completion? Or, does this feature already exist in some other form? (E.g., is there a post-processing script?)

I can write this myself -- it will take 5 minutes or less.

cassanof commented 2 months ago

Yeaaaaaaaaaaaaaah my solution is nasty, check it out here: https://github.com/nuprl/MultiPL-E/blob/c31a48ccb8f55042c611d46c068da77dce442024/autochatmodel.py#L217

arjunguha commented 2 months ago

lol ok