The result from open_llm_leaderboard is not as expected.

openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache License 2.0

7.29k stars 372 forks source link

The result from open_llm_leaderboard is not as expected. #48

Open chi2liu opened 1 year ago

chi2liu commented 1 year ago

open_llm_leaderboard had updated the result for open-llama-3b and open-llama-7b.

This result is much worse than llama-7b and does not match expectations. Is it because of the fast tokenizer issue mention in the document?

gjmulder commented 1 year ago

Relative scores compared to llama-7b:

There's a clear performance hit for the multi-shot tasks, as compared to llama-7b

young-geng commented 1 year ago

This is likely the issue of the auto-converted fast tokenizer. I've created an issue here

c0bra commented 1 year ago

@young-geng looks like the issue in that repo was fixed last week. I'm assuming this could be retried now? (@chi2liu)

codesoap commented 1 year ago

@c0bra There has not yet been a new release of huggingface/transformers since the fix has been merged: https://github.com/huggingface/transformers/releases. I assume we still need to wait for this.

The already existing entries for OpenLLaMa on the leader-board disappeared around a week ago as well. Maybe there is a connection and the maintainers of the leader-board removed the results, because they learned of the bug and are now waiting for the next release of huggingface/transformers... That's just my guess, though.

young-geng commented 1 year ago

@codesoap Yeah I've contacted the maintainers for the leaderboard for a re-evaluation request, and the model should be in the queue right now.

gjmulder commented 1 year ago

open-llama-7b-open-instruct is pending evaluation in open_llm_leaderboard. They confirmed that they fine-tuned with use_fast = False

HeegyuKim commented 1 year ago

OpenLLaMa 3B result is not pending. is there any reason?