Hi. I’m trying to continue training my model with FMoETransformerMLP. Intuitively, if i clone the parameters from original FFNlayer to FMoETransformerMLP experts. The evaluation result should be the same. But i notice a inconsistent result even with 1 expert.
For a moe_ffexpert, the parameters of experts.0.htoh4.weight、experts.0.htoh4.bias、experts.0.h4toh.weight、experts.0.h4toh.bias are copied, and since _Expertset the shape to [1, d_model, d_hidden] tensor, i use v.clone().unsqueeze(0) to expand my original FFN parameters.
Hi. I’m trying to continue training my model with
FMoETransformerMLP
. Intuitively, if i clone the parameters from originalFFN
layer toFMoETransformerMLP
experts. The evaluation result should be the same. But i notice a inconsistent result even with 1 expert.For a
moe_ff
expert, the parameters ofexperts.0.htoh4.weight
、experts.0.htoh4.bias
、experts.0.h4toh.weight
、experts.0.h4toh.bias
are copied, and since_Expert
set the shape to[1, d_model, d_hidden]
tensor, i usev.clone().unsqueeze(0)
to expand my originalFFN
parameters.Was there anything i have missed?