Open mtc2013 opened 1 year ago
We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.
How much did it cost you guys so far training the 7B model?
In the original Llama, there where the sizes
As we all know, there is a really BIG gap in filesizes between 13B and 30B, and again up to the 65B model. For many of us, the best possible model to run on one own's hardware is determined by how large a model fits into the hardware. I would LOVE to see a 50 Layers Model, which would possibly be around 25B params, and a 70 Layers Model, being around50B params.
Maybe after training the 7B version, it may be nice not to focus on copying exactly the same model sizes, but different ones?
Any update on whether training for larger models will eventually happen? Perhaps the TPU Research Cloud could be a free source of compute for the training process, and SlimPajama could be used in place of RedPajama to further accelerate the training.
+1 on the 33b llama. It performs much better than the 13b one.
Are there any plans to train a 30b replica of Llama or is the 7b enough to meet your purposes of comparison?