Closed casper-hansen closed 7 months ago
Hi, you can find our mixtral implementation here in the example folder: https://github.com/shawntan/scattermoe/tree/main/examples/mixtral
Hi @yikangshen, thanks for your example. It seems this is not able to run inference with the weights released by Mistral. How do we use Scatter MoE with models that have pretrained weights, e.g. for finetuning or running inference?
I've written a conversion script that overrides _load_pretrained_model
and converts the safetensor files to merge the experts into the ScatterMoE format.
https://github.com/shawntan/scattermoe/blob/main/examples/mixtral/modeling_mixtral.py#L878
Try it and see if it works for you.
EDIT: My bad. It's still buggy. Working on it.
Thanks @shawntan. When I tried to get it working, I was having issues with memory usage doubling from the normal implementation. Let me know when you find a fix for it
Try out this merge script and let me see how it works for you: https://github.com/shawntan/scattermoe/commit/0526612bd53f3fb4dc3418eaa2876c89efe5e4e2
Hi, I was reading the paper and it looks nice. Do you have any examples of using this with Mixtral? Perhaps you can share some of the benchmarking code from the paper?