Open chi2liu opened 1 year ago
Relative scores compared to llama-7b:
There's a clear performance hit for the multi-shot tasks, as compared to llama-7b
This is likely the issue of the auto-converted fast tokenizer. I've created an issue here
@young-geng looks like the issue in that repo was fixed last week. I'm assuming this could be retried now? (@chi2liu)
@c0bra There has not yet been a new release of huggingface/transformers since the fix has been merged: https://github.com/huggingface/transformers/releases. I assume we still need to wait for this.
The already existing entries for OpenLLaMa on the leader-board disappeared around a week ago as well. Maybe there is a connection and the maintainers of the leader-board removed the results, because they learned of the bug and are now waiting for the next release of huggingface/transformers... That's just my guess, though.
@codesoap Yeah I've contacted the maintainers for the leaderboard for a re-evaluation request, and the model should be in the queue right now.
open-llama-7b-open-instruct is pending evaluation in open_llm_leaderboard. They confirmed that they fine-tuned with use_fast = False
OpenLLaMa 3B result is not pending. is there any reason?
open_llm_leaderboard had updated the result for open-llama-3b and open-llama-7b.
This result is much worse than llama-7b and does not match expectations. Is it because of the fast tokenizer issue mention in the document?