laekov / fastmoe

A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.52k stars 184 forks source link

Inconsistent evaluation result when clone expert parameters from original FFN #179

Closed Heihaierr closed 9 months ago

Heihaierr commented 10 months ago

Hi. I’m trying to continue training my model with FMoETransformerMLP. Intuitively, if i clone the parameters from original FFNlayer to FMoETransformerMLP experts. The evaluation result should be the same. But i notice a inconsistent result even with 1 expert.

For a moe_ffexpert, the parameters of experts.0.htoh4.weightexperts.0.htoh4.biasexperts.0.h4toh.weightexperts.0.h4toh.bias are copied, and since _Expertset the shape to [1, d_model, d_hidden] tensor, i use v.clone().unsqueeze(0) to expand my original FFN parameters.

Was there anything i have missed?

Heihaierr commented 9 months ago

There is a bug during copy the parameters. I've closed the issue.