Open toyaji opened 2 years ago
Your pre-trained ckpt and model prams have different param keys for some layers.
Pretrained state-dict has the following keys:
ga.gamma ga.conv.weight ga.conv.bias da.gamma
But the current model has different names for those layers:
csa.gamma csa.conv.weight csa.conv.bias la.gamma
So, the 'load_state_dict' function from the model class does not work in a proper way, just ignoring those layers.
Is it OK for the reproducibility of your model?
Your pre-trained ckpt and model prams have different param keys for some layers.
Pretrained state-dict has the following keys:
But the current model has different names for those layers:
So, the 'load_state_dict' function from the model class does not work in a proper way, just ignoring those layers.
Is it OK for the reproducibility of your model?