when will be 1T check point ready?

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.61k stars 444 forks source link

when will be 1T check point ready? #49

Closed xiaoyunwu closed 11 months ago

ChaosCodes commented 11 months ago

Thank you for your inquiry regarding the 1T checkpoint. As of now, you can explore and test our existing checkpoints (already including 1T checkpoint) on Hugging Face through the following link: tinyLlama-intermediate-checkpoints. We are in the process of training our TinyLlama chat model, which will be available in the near future. And we will also create a TinyLlama-1.1B-checkpoint for 1T token repo later.

xiaoyunwu commented 11 months ago

what is the commonsense score with 1T?

jzhang38 commented 11 months ago

Model	Pretrain Tokens	HellaSwag	Obqa	WinoGrande	ARC_c	ARC_e	boolq	piqa	avg
Pythia-1.0B	300B	47.16	31.40	53.43	27.05	48.99	60.83	69.21	48.30
TinyLlama-1.1B-intermediate-step-50K-104b	103B	43.50	29.80	53.28	24.32	44.91	59.66	67.30	46.11
TinyLlama-1.1B-intermediate-step-240k-503b	503B	49.56	31.40	55.80	26.54	48.32	56.91	69.42	48.28
TinyLlama-1.1B-Chat-v0.1	503B	53.81	32.20	55.01	28.67	49.62	58.04	69.64	49.57
TinyLlama-1.1B-intermediate-step-480k-1007B	1007B	52.54	33.40	55.96	27.82	52.36	59.54	69.91	50.22

I will update the repo later

VatsaDev commented 11 months ago

Wow, so chinchilla scaling needs help! Other than that though, every benchmark went up, congrats!

Also could you remove the chat model table. Makes the table look like you went down on benchmarks, but you realize its the chat model.

jzhang38 commented 11 months ago

Makes the table look like you went down on benchmarks, but you realize its the chat model.

@VatsaDev Sure. Thanks for the advice.

xiaoyunwu commented 11 months ago

@jzhang38 I am curious why did not have a instruction tuning phase, instead go directly to chat? Is support chat in video game primary drive for you to work on this? But I think with small model, I think instruction tuning a more important. As the amount of world it can model is limited anyways.
By the way, I think instead of wasting compute to large models, this project is what we really need, trying to figure out what we can get by throwing more compute to small models. If we can get RAG works well with small models, I think this will have a huge implication. And I think it is very possible.

VatsaDev commented 11 months ago

@jzhang38 thanks for the change, clearly shows the progress now @xiaoyunwu we talk about RAG/toolformer, in #10