microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
712 stars 88 forks source link

Question about multi-gate refer to multi-task learning #70

Open Tokkiu opened 2 years ago

Tokkiu commented 2 years ago

Thanks for your contribution and excellent work of tutel! I am wondering can I use tutel to implement the multi-gate above experts like the picture as follows? Screenshot 2021-12-26 at 10 52 16 PM

Currently, I can't see any similar solution in example files.

ghostplant commented 2 years ago

Do you mean something like this?

self._layer = tutel.moe.moe_layer(gate1_type={..}, gate2_type={..}, ..)

output1 = self._layer(data, use_gate1)
output2 = self._layer(data, use_gate2)
Tokkiu commented 2 years ago

@ghostplant Yes! And how to specify 'use_gate1' and 'use_gate2'?

ghostplant commented 2 years ago

We are going to merge this: https://github.com/microsoft/tutel/pull/71/files You can create new moe layers by specifying a list of original gating types. And when forwarding the moe layer, you can use self._moe_layer(data, gate_index=) to choose which gate to use.

Tokkiu commented 2 years ago

We are going to merge this: https://github.com/microsoft/tutel/pull/71/files You can create new moe layers by specifying a list of original gating types. And when forwarding the moe layer, you can use self._moe_layer(data, gate_index=) to choose which gate to use.

@ghostplant Nice work! Looking forward to your new feature.

ghostplant commented 2 years ago

It's done. Feel free to share any feedbacks. Thanks!