Open kwisniewski98 opened 5 days ago
@libinta this commit is required with next OH release
Please consider not to use such API with combined w1 and w3. In the future the dynamic MoE can also support for training, such method would change the weights ordering when saving the checkpoints.
The dynamic MoE can also support the separate call of w1, w2 and w3. It is not necessary to align with vLLM Mixtral.
final_hidden_states = torch.ops.hpu.mixture_of_experts( hidden_states=hidden_states, expert_routing_table=selected_experts, router_weights=routing_weights, w1=w1_list, w2=w3_list, w3=w2_list, permuted_weights=True, activation=act_fn, experts_min=0, experts_max=self.num_experts - 1 )
What's the difference between this PR and #1518 ?
this is duplicate, sorry, my mistake
Please consider not to use such API with combined w1 and w3. In the future the dynamic MoE can also support for training, such method would change the weights ordering when saving the checkpoints. The dynamic MoE can also support the separate call of w1, w2 and w3. It is not necessary to align with vLLM Mixtral.
final_hidden_states = torch.ops.hpu.mixture_of_experts( hidden_states=hidden_states, expert_routing_table=selected_experts, router_weights=routing_weights, w1=w1_list, w2=w3_list, w3=w2_list, permuted_weights=True, activation=act_fn, experts_min=0, experts_max=self.num_experts - 1 )
Done, I've used an op that you've suggested
LGTM, thanks
Add DynamicMoE support for Mixtral