issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
Add param group weight decay
#139
Closed
3outeille
closed
5 months ago
3outeille
commented
5 months ago
Can now apply weight decay to different params
Make it compatible with MuTransfer
Add support for SGD optimizer which make it easier to test params groups