Open xuanqing94 opened 1 year ago
We are indeed planning to further train the model. We are still figuring out the right data mixture for that.
I'm looking very close this thread too., but in specific Portuguese language, , following the paper I've been using the 400BT + IFT and it already works very well.
Those figures prove 1T data 7B is equal to 0.5T data 13B, then consider the ratio, at some point, like 3T data, they might have the same performance.
I'm sure most folks here have read the Llama 2 paper, but I wanted to include this figure as further evidence that training up to 2T tokens is beneficial even for smaller models:
I appreciate all of the hard work that folks here are doing to create truly free and open high-quality models!
Thanks for generously releasing the checkpoint. According to LLAMA paper, the downstream task performance keeps improving even at 1T tokens (figure below), I am wondering if you have the budget and dataset for contiguing the training process with more tokens? People will be excited to see a small but strong OpenLLAMA-7B model, not just a replicate of the official LLAMA.
If this is not on your plan, could you please release the optimizer states and hyperparams so others can keep training? Thank you!