huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Making lr schedule more flexible #59

Closed thomwolf closed 7 months ago

thomwolf commented 7 months ago

Small changes that should be backward compatible to make LR scheduling more flexible and be able to train with MiniCPM schedules