Open gongwei-130 opened 3 years ago
Hi @gongwei-130, we are working towards the release of a full example and tutorial for pipelined Megatron-LM. In short, you need to flatten the model to a stack of layers and to override the underlying model/data parallel mpu
code in Megatron to use DeepSpeed's 3D topology.
Thanks for reply. Any eta for the release?
In tutorial, it says "DeepSpeed’s training engine provides hybrid data and pipeline parallelism and can be further combined with model parallelism such as Megatron-LM. ". Is there any example/tutorial how to combine them? Thanks.