Open XinnuoXu opened 1 year ago
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
anli_r1 | 0 | acc | 0.3330 | ± | 0.0149 |
anli_r2 | 0 | acc | 0.3320 | ± | 0.0149 |
anli_r3 | 0 | acc | 0.3367 | ± | 0.0136 |
arc_challenge | 0 | acc | 0.2099 | ± | 0.0119 |
acc_norm | 0.2705 | ± | 0.0130 | ||
arc_easy | 0 | acc | 0.2542 | ± | 0.0089 |
acc_norm | 0.2517 | ± | 0.0089 | ||
hellaswag | 0 | acc | 0.2621 | ± | 0.0044 |
acc_norm | 0.2741 | ± | 0.0045 | ||
openbookqa | 0 | acc | 0.1800 | ± | 0.0172 |
acc_norm | 0.2500 | ± | 0.0194 | ||
piqa | 0 | acc | 0.5147 | ± | 0.0117 |
acc_norm | 0.5011 | ± | 0.0117 | ||
record | 0 | f1 | 0.2017 | ± | 0.0040 |
em | 0.1964 | ± | 0.0040 | ||
rte | 0 | acc | 0.4946 | ± | 0.0301 |
truthfulqa_mc | 1 | mc1 | 0.2375 | ± | 0.0149 |
mc2 | 0.4767 | ± | 0.0169 | ||
wic | 0 | acc | 0.5000 | ± | 0.0198 |
winogrande | 0 | acc | 0.5099 | ± | 0.0140 |
It seems that the anli_ and truthfulqamc are similar. But the rest is -20% worse. I'm wondering the results reported in this repo for hellaswag and ARC are few-shot = 0 or not?
Everything reported here is zero shot. Did you turn off the fast tokenizer when evaluating? There is a bug in the recent release of transformers library which causes the auto converted tokenizer to output different tokens than the original tokenizer. Therefore, when evaluating OpenLLaMA, you need to turn off the fast tokenizer.
Is that bug still there? I thought I read somewhere that it got fixed.
@buzzCraft It got fixed in the main branch of transformers but there hasn't been a release with that fix yet
@young-geng ok,since we are on the bleeding edge of the llm field, I usually go with the dev branch.
I also want to thank you and the team for the amazing work you have done. ❤️