[QUERY] Expert Parallelism Supported?

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Apache License 2.0

1.76k stars 163 forks source link

[QUERY] Expert Parallelism Supported? #498

Open Shamauk opened 1 week ago

Shamauk commented 1 week ago

Looking at the engine and I would like to run inference with the Mixtral model while doing expert parallelism. I see that DeepSpeed itself seems to have some support but I saw a post where I should use mii if I want to run inference with Mixtral.

Anyone knows? If not – anyone knows of an alternative system?