huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

How is it compared with Megatron Deepspeed? #36

Closed allanj closed 8 months ago

allanj commented 8 months ago
  1. Wondering about the relationship with Megatron Deepspeed
  2. Are they the same thing? or which one is faster?
NouamaneTazi commented 8 months ago

The plan is keeping the codebase as minimal as possible with a more explicit and accessible design for users. And at least on par or faster performance than megatron deepspeed