openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.29k stars 372 forks source link

Any plans to train for 30b model #11

Open mtc2013 opened 1 year ago

mtc2013 commented 1 year ago

Are there any plans to train a 30b replica of Llama or is the 7b enough to meet your purposes of comparison?

forhaoliu commented 1 year ago

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

lksysML commented 1 year ago

How much did it cost you guys so far training the 7B model?

maddes8cht commented 1 year ago

In the original Llama, there where the sizes

As we all know, there is a really BIG gap in filesizes between 13B and 30B, and again up to the 65B model. For many of us, the best possible model to run on one own's hardware is determined by how large a model fits into the hardware. I would LOVE to see a 50 Layers Model, which would possibly be around 25B params, and a 70 Layers Model, being around50B params.

Maybe after training the 7B version, it may be nice not to focus on copying exactly the same model sizes, but different ones?

redbrain commented 1 year ago

Any update on whether training for larger models will eventually happen? Perhaps the TPU Research Cloud could be a free source of compute for the training process, and SlimPajama could be used in place of RedPajama to further accelerate the training.

qizzzh commented 1 year ago

+1 on the 33b llama. It performs much better than the 13b one.