jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.61k stars 444 forks source link

when will be 1T check point ready? #49

Closed xiaoyunwu closed 11 months ago

ChaosCodes commented 11 months ago

Thank you for your inquiry regarding the 1T checkpoint. As of now, you can explore and test our existing checkpoints (already including 1T checkpoint) on Hugging Face through the following link: tinyLlama-intermediate-checkpoints. We are in the process of training our TinyLlama chat model, which will be available in the near future. And we will also create a TinyLlama-1.1B-checkpoint for 1T token repo later.

xiaoyunwu commented 11 months ago

what is the commonsense score with 1T?

jzhang38 commented 11 months ago
Model Pretrain Tokens HellaSwag Obqa WinoGrande ARC_c ARC_e boolq piqa avg
Pythia-1.0B 300B 47.16 31.40 53.43 27.05 48.99 60.83 69.21 48.30
TinyLlama-1.1B-intermediate-step-50K-104b 103B 43.50 29.80 53.28 24.32 44.91 59.66 67.30 46.11
TinyLlama-1.1B-intermediate-step-240k-503b 503B 49.56 31.40 55.80 26.54 48.32 56.91 69.42 48.28
TinyLlama-1.1B-Chat-v0.1 503B 53.81 32.20 55.01 28.67 49.62 58.04 69.64 49.57
TinyLlama-1.1B-intermediate-step-480k-1007B 1007B 52.54 33.40 55.96 27.82 52.36 59.54 69.91 50.22

I will update the repo later

VatsaDev commented 11 months ago

Wow, so chinchilla scaling needs help! Other than that though, every benchmark went up, congrats!

Also could you remove the chat model table. Makes the table look like you went down on benchmarks, but you realize its the chat model.

jzhang38 commented 11 months ago

Makes the table look like you went down on benchmarks, but you realize its the chat model.

@VatsaDev Sure. Thanks for the advice.

xiaoyunwu commented 11 months ago

@jzhang38 I am curious why did not have a instruction tuning phase, instead go directly to chat? Is support chat in video game primary drive for you to work on this? But I think with small model, I think instruction tuning a more important. As the amount of world it can model is limited anyways.
By the way, I think instead of wasting compute to large models, this project is what we really need, trying to figure out what we can get by throwing more compute to small models. If we can get RAG works well with small models, I think this will have a huge implication. And I think it is very possible.

VatsaDev commented 11 months ago

@jzhang38 thanks for the change, clearly shows the progress now @xiaoyunwu we talk about RAG/toolformer, in #10