shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.
Apache License 2.0
147 stars 9 forks source link

Tensor Parallelism #1

Open timmytwoteeth opened 3 months ago

timmytwoteeth commented 3 months ago

Hello,

Thank you for the great work.

I was wondering if scatter moe supported tensor parallelism?

Thank you!

shawntan commented 3 months ago

We're thinking about it! I'll keep this issue open as a reminder.

timmytwoteeth commented 3 months ago

Appreciate the update.

yikangshen commented 3 months ago

If you need a model parallel for training, we suggest using Pipeline parallelism for now. It works very well for MoE models because they usually have a narrow hidden state compared to the number of parameters.