'load_state_dict' method just pass some params of the pretrained

Your pre-trained ckpt and model prams have different param keys for some layers.

Pretrained state-dict has the following keys:

ga.gamma
ga.conv.weight
ga.conv.bias
da.gamma

But the current model has different names for those layers:

csa.gamma
csa.conv.weight
csa.conv.bias
la.gamma

So, the 'load_state_dict' function from the model class does not work in a proper way, just ignoring those layers.

Is it OK for the reproducibility of your model?

wwlCape / HAN