Spmd whether expert parallelism is supported？

pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

https://pytorch.org/xla

Other

2.38k stars 427 forks source link

Spmd whether expert parallelism is supported？ #7049

Open mars1248 opened 1 month ago

mars1248 commented 1 month ago

torchxla spmd whether expert parallelism is supported？ If it is a moe model, how should it be computed in xla？

❓ Questions and Help

JackCaoG commented 1 month ago

We are actually actively working on a MOE distributed training example, maybe. @alanwaketan can share more details.

alanwaketan commented 1 month ago

Yea, will let you know once we have more information.

mars1248 commented 1 month ago

Yea, will let you know once we have more information.

@alanwaketan Can you tell me a little bit about your thinking? I want to express the experts in parallel in spmd, and then add custom calls to solve the routing problem of variable length tokens