Any plans to train for 30b model

mtc2013 commented 1 year ago

Are there any plans to train a 30b replica of Llama or is the 7b enough to meet your purposes of comparison?

forhaoliu commented 1 year ago

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

lksysML commented 1 year ago

How much did it cost you guys so far training the 7B model?

maddes8cht commented 1 year ago

In the original Llama, there where the sizes

7B with 32 Layers
13B with 40 Layers
30B with 60 Layers
65B with 80 Layers

As we all know, there is a really BIG gap in filesizes between 13B and 30B, and again up to the 65B model. For many of us, the best possible model to run on one own's hardware is determined by how large a model fits into the hardware. I would LOVE to see a 50 Layers Model, which would possibly be around 25B params, and a 70 Layers Model, being around50B params.

Maybe after training the 7B version, it may be nice not to focus on copying exactly the same model sizes, but different ones?

redbrain commented 1 year ago

Any update on whether training for larger models will eventually happen? Perhaps the TPU Research Cloud could be a free source of compute for the training process, and SlimPajama could be used in place of RedPajama to further accelerate the training.

qizzzh commented 1 year ago

+1 on the 33b llama. It performs much better than the 13b one.

openlm-research / open_llama

Any plans to train for 30b model #11