Looking at the engine and I would like to run inference with the Mixtral model while doing expert parallelism. I see that DeepSpeed itself seems to have some support but I saw a post where I should use mii if I want to run inference with Mixtral.
Anyone knows?
If not – anyone knows of an alternative system?
Looking at the engine and I would like to run inference with the Mixtral model while doing expert parallelism. I see that DeepSpeed itself seems to have some support but I saw a post where I should use mii if I want to run inference with Mixtral.
Anyone knows? If not – anyone knows of an alternative system?