Open chongli-uw opened 2 months ago
this is an early example
@merrymercy Hi, any progress has been made on this issue? The example you provided previously didn't use FusedMOE but mlp. How can we enable Expert Parallel with the current Mixtral/DeepSeek-v2 after using FusedMOE? Do you have a modified example?
related #1970
related #1970
@merrymercy I see that this issue is mainly related to TP and DP. I noticed that the SGLang Q4 roadmap #1487 mentioned supporting this feature.
@liangzelang DP has already been merged(only for DeepSeek right now) https://github.com/sgl-project/sglang/pull/1970 and EP will be supported soon cc @ispobock
@liangzelang DP has already been merged(only for DeepSeek right now) #1970 and EP will be supported soon cc @ispobock
@zhyncs Does MoE-EP have any support? I have implemented MoE-EP.
Does MoE-EP have any support? I have implemented MoE-EP.
@xiaobochen123 We are going to implement it with a DP + EP approach for throughput gains. Currently, DP attention is implemented. Before we start the EP, some updates to the MoE codebase should be done.
I am interested in what kind of MoE-EP did you implement and what codebase did you use? How much are the performance gains compared to TP?
Checklist
Motivation
Hi team, First of all thanks so much for such a great project. I am wondering if there is plan to support Expert Parallelism for MoE models?
Related resources
https://nvidia.github.io/TensorRT-LLM/advanced/expert-parallelism.html