Closed coder543 closed 6 months ago
I'm equally curious!
Hi, thanks for your interest. You can try the 3T version TinyLlama via https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main, but as you can see, the model may have already satuated before 3T. We are conducting some analysis and will aim to share some advice on checkpoint selection and other learning points in due course.
您好,感謝您的關注。您可以透過https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main嘗試 3T 版本的 TinyLlama ,但正如您所看到的,模型可能在3T 之前就已經飽和了。我們正在進行一些分析,並將在適當的時候分享一些有關檢查點選擇和其他學習點的建議。
Will you try with larger parameters? Will there be a Chinese version of the model later?
Hi, thanks for your interest. You can try the 3T version TinyLlama via https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main, but as you can see, the model may have already satuated before 3T. We are conducting some analysis and will aim to share some advice on checkpoint selection and other learning points in due course.
Maybe you can try Mamba, rwkv and StripedHyena architecture?
Maybe you can try Mamba, rwkv and StripedHyena architecture?
If we have the compute.
Hey, sorry if I'm just too excited to see the final checkpoint of TinyLlama, but is the 3T checkpoint ready? The timeline on the README indicates it was supposed to be finished yesterday, I think.
Also, do you know if there will be a "Chat"-tuned model for the 3T checkpoint?
Thanks for all the hard work! This has been a really cool project.