why 7b acc less than 3b?

openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache License 2.0

7.36k stars 374 forks source link

why 7b acc less than 3b? #19

Closed lucasjinreal closed 1 year ago

lucasjinreal commented 1 year ago

from the table, seems 7b acc not higher than 3b, why?

young-geng commented 1 year ago

Sometimes it does happen on individual tasks that a smaller model outperforms a larger one slightly, especially when the model sizes are close. We've also observed this before in our previous paper.

lucasjinreal commented 1 year ago

then how to more properly eval the real performance in terms of different params. From these anli field, 3b out perform all to 7b, does the vocab sizes also matter?

young-geng commented 1 year ago

To reliably evaluate these models, we generally look at many tasks in aggregation instead of just individual ones.