Open timmytwoteeth opened 3 months ago
We're thinking about it! I'll keep this issue open as a reminder.
Appreciate the update.
If you need a model parallel for training, we suggest using Pipeline parallelism for now. It works very well for MoE models because they usually have a narrow hidden state compared to the number of parameters.
Hello,
Thank you for the great work.
I was wondering if scatter moe supported tensor parallelism?
Thank you!