@cassanof: I'm using autochatmodel.py to evaluate gpt4o-mini. The array of completions that it builds includes the prompt. Our evaluation code concatenates the prompt and completion together:
Should I add a flag to the evaluation container to make it not concatenate prompt + completion? Or, does this feature already exist in some other form? (E.g., is there a post-processing script?)
I can write this myself -- it will take 5 minutes or less.
@cassanof: I'm using autochatmodel.py to evaluate gpt4o-mini. The array of completions that it builds includes the prompt. Our evaluation code concatenates the prompt and completion together:
https://github.com/nuprl/MultiPL-E/blob/main/evaluation/src/main.py#L34
Should I add a flag to the evaluation container to make it not concatenate prompt + completion? Or, does this feature already exist in some other form? (E.g., is there a post-processing script?)
I can write this myself -- it will take 5 minutes or less.