jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.7k stars 453 forks source link

revise dataloader for continue training #162

Closed peiji1981 closed 7 months ago

peiji1981 commented 7 months ago

revise dataloader for continue training

peiji1981 commented 7 months ago

@jzhang38 hi, when i resume pretrain tinyllama, i found this revision can help reduce resume pretrain ready time