shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.
Apache License 2.0
186 stars 14 forks source link

Different number of experts for each token #18

Open Cy-47 opened 1 day ago

Cy-47 commented 1 day ago

Hello, I wonder if routing each token to a different number of experts (while also routing a different number of tokens to each expert) is supported. Thank you!