openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.29k stars 372 forks source link

What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

Closed chi2liu closed 1 year ago

chi2liu commented 1 year ago

We cannot find the "ddboolq" in lm-evaluation-harness.

We can only find the boolq task in the task list. And we run the boolq for the open-llama-3b, the result is different.

image

So want to know what is ddboolq in the evaluation?

young-geng commented 1 year ago

Sorry that was a typo, and the task is boolq. We did our evaluation in JAX so there could be slide difference due to numerical precisions. Also please note that to correctly evaluate our model in lm-eval-harness, you need to change the lm-eval-harness code to avoid using the huggingface auto-converted fast tokenizer, as that tokenizer produces incorrect tokens sometimes. See this issue for more details.