huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Fix converter #130

Closed AleHD closed 5 months ago

AleHD commented 5 months ago
AleHD commented 5 months ago

Accidentally opened this PR. This is not ready yet, the code will be pused to #125 when ready. Please disregard this PR.