jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.3k stars 425 forks source link

More intermediate checkpoints in < 240k steps #176

Open MaveriQ opened 2 months ago

MaveriQ commented 2 months ago

Hi!

Thank you for great repo and the models. I want to pretrain the model with a new tokenizer, but since 16 A100 GPUs are hard to get by, I was wondering if you can release further checkpoints in <500B tokens-seen category. As shown in OLMo paper (see figure), for some tasks, there is a linear increase in performance with number of tokens seen by the model. Thus training for even less steps can give some idea of the new setup. Figure 1 from OLMo paper

Thanks!