More intermediate checkpoints in < 240k steps

Hi!

Thank you for great repo and the models. I want to pretrain the model with a new tokenizer, but since 16 A100 GPUs are hard to get by, I was wondering if you can release further checkpoints in <500B tokens-seen category. As shown in OLMo paper (see figure), for some tasks, there is a linear increase in performance with number of tokens seen by the model. Thus training for even less steps can give some idea of the new setup. Figure 1 from OLMo paper

Thanks!

jzhang38 / TinyLlama

More intermediate checkpoints in < 240k steps #176