Any plan to continue training beyond 1T tokens?

openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache License 2.0

7.29k stars 372 forks source link

Any plan to continue training beyond 1T tokens? #26

Open xuanqing94 opened 1 year ago

xuanqing94 commented 1 year ago

Thanks for generously releasing the checkpoint. According to LLAMA paper, the downstream task performance keeps improving even at 1T tokens (figure below), I am wondering if you have the budget and dataset for contiguing the training process with more tokens? People will be excited to see a small but strong OpenLLAMA-7B model, not just a replicate of the official LLAMA.

If this is not on your plan, could you please release the optimizer states and hyperparams so others can keep training? Thank you!

young-geng commented 1 year ago

We are indeed planning to further train the model. We are still figuring out the right data mixture for that.

finardi commented 1 year ago

I'm looking very close this thread too., but in specific Portuguese language, , following the paper I've been using the 400BT + IFT and it already works very well.

FNsi commented 1 year ago

Those figures prove 1T data 7B is equal to 0.5T data 13B, then consider the ratio, at some point, like 3T data, they might have the same performance.

jncraton commented 12 months ago

I'm sure most folks here have read the Llama 2 paper, but I wanted to include this figure as further evidence that training up to 2T tokens is beneficial even for smaller models:

Llama 2 Perplexity

I appreciate all of the hard work that folks here are doing to create truly free and open high-quality models!