3T checkpoint? - Githubissues

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.31k stars 426 forks source link

3T checkpoint? #115

Closed coder543 closed 6 months ago

coder543 commented 6 months ago

Hey, sorry if I'm just too excited to see the final checkpoint of TinyLlama, but is the 3T checkpoint ready? The timeline on the README indicates it was supposed to be finished yesterday, I think.

Also, do you know if there will be a "Chat"-tuned model for the 3T checkpoint?

Thanks for all the hard work! This has been a really cool project.

win10ogod commented 6 months ago

I'm equally curious!

ChaosCodes commented 6 months ago

Hi, thanks for your interest. You can try the 3T version TinyLlama via https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main, but as you can see, the model may have already satuated before 3T. We are conducting some analysis and will aim to share some advice on checkpoint selection and other learning points in due course.

win10ogod commented 6 months ago

您好，感謝您的關注。您可以透過https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main嘗試 3T 版本的 TinyLlama ，但正如您所看到的，模型可能在3T 之前就已經飽和了。我們正在進行一些分析，並將在適當的時候分享一些有關檢查點選擇和其他學習點的建議。

Will you try with larger parameters? Will there be a Chinese version of the model later?

win10ogod commented 6 months ago

Hi, thanks for your interest. You can try the 3T version TinyLlama via https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main, but as you can see, the model may have already satuated before 3T. We are conducting some analysis and will aim to share some advice on checkpoint selection and other learning points in due course.

Maybe you can try Mamba, rwkv and StripedHyena architecture?

jzhang38 commented 6 months ago

Maybe you can try Mamba, rwkv and StripedHyena architecture?

If we have the compute.