issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
[Refactor] DistributedOptimizer and FP32GradAccum
#20
Open
NouamaneTazi
opened
8 months ago
NouamaneTazi
commented
8 months ago
DistributedOptimizer and FP32GradAccum need to be refactored
DistributedOptimizer and FP32GradAccum need to be refactored