How Megatron work with pipeline module?

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

34.82k stars 4.05k forks source link

How Megatron work with pipeline module? #444

Open gongwei-130 opened 3 years ago

gongwei-130 commented 3 years ago

In tutorial, it says "DeepSpeed’s training engine provides hybrid data and pipeline parallelism and can be further combined with model parallelism such as Megatron-LM. ". Is there any example/tutorial how to combine them? Thanks.

ShadenSmith commented 3 years ago

Hi @gongwei-130, we are working towards the release of a full example and tutorial for pipelined Megatron-LM. In short, you need to flatten the model to a stack of layers and to override the underlying model/data parallel mpu code in Megatron to use DeepSpeed's 3D topology.

gongwei-130 commented 3 years ago

Thanks for reply. Any eta for the release?