nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

code generated with wrong end of string place #128

Closed tedvuminhhuy closed 9 months ago

tedvuminhhuy commented 9 months ago

I used model: Code Llama - Instruct 13B to generate code for MultiPL-E Typescript language

In some test cases: e.g. HumanEval_109_move_one_ball the end of the code generated not ending correctly like this:

console.log(move_one_ball([3, 4, 5, 1, 2]));
console.log(move_one_ball([3, 5, 4, 1,

that cause the syntax error. I don't know if that comes from the model Code Llama or our tool MultiPL-E ?

can you guys please double check ? other test case like: HumanEval_112_reverse_delete also has the same error

cc @arjunguha

arjunguha commented 9 months ago

I'm going to guess that this is an issue with the default maximum sequence length. We've hardcoded 512 new tokens, which should be adequate for a completion model:

https://github.com/nuprl/MultiPL-E/blob/efd3d5ad1e2cd421b80ef6e7baacc3ad9bc03d7b/multipl_e/completions.py#L13

But, if you're using a chattier instruct model, you probably want to increase it.

I think the bigcode-evaluation-harness let's you set the max new tokens from the CLI.

tedvuminhhuy commented 9 months ago

Thank you, @arjunguha!

Do you happen to know how many max_tokens Code Llama uses while benchmarking? I'm trying to compare my model with Code Llama, but their paper doesn't say anything about that.