shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.
Apache License 2.0
150 stars 10 forks source link

Mixtral inference example #2

Closed casper-hansen closed 3 months ago

casper-hansen commented 4 months ago

Hi, I was reading the paper and it looks nice. Do you have any examples of using this with Mixtral? Perhaps you can share some of the benchmarking code from the paper?

yikangshen commented 3 months ago

Hi, you can find our mixtral implementation here in the example folder: https://github.com/shawntan/scattermoe/tree/main/examples/mixtral

casper-hansen commented 3 months ago

Hi @yikangshen, thanks for your example. It seems this is not able to run inference with the weights released by Mistral. How do we use Scatter MoE with models that have pretrained weights, e.g. for finetuning or running inference?

shawntan commented 3 months ago

I've written a conversion script that overrides _load_pretrained_model and converts the safetensor files to merge the experts into the ScatterMoE format. https://github.com/shawntan/scattermoe/blob/main/examples/mixtral/modeling_mixtral.py#L878 Try it and see if it works for you.

EDIT: My bad. It's still buggy. Working on it.

casper-hansen commented 3 months ago

Thanks @shawntan. When I tried to get it working, I was having issues with memory usage doubling from the normal implementation. Let me know when you find a fix for it

shawntan commented 3 months ago

Try out this merge script and let me see how it works for you: https://github.com/shawntan/scattermoe/commit/0526612bd53f3fb4dc3418eaa2876c89efe5e4e2