Hi all. After walking through the examples, I suppose that tutel currently support data / tensor parallel for its moe layer module. Is it correct? Then, what can I do if I want the entire model training support Pipeline parallel?
Or, can tutel be used concurrently with Megatron or Deepspeed? If so, then I can configure the hybrid parallel following the other two's configuration manner.
Hi all. After walking through the examples, I suppose that tutel currently support data / tensor parallel for its moe layer module. Is it correct? Then, what can I do if I want the entire model training support Pipeline parallel?
Or, can tutel be used concurrently with Megatron or Deepspeed? If so, then I can configure the hybrid parallel following the other two's configuration manner.