Open syx11237744 opened 1 month ago
Hey! Sorry we might need a bit more context, like which checkpoint are you converting?
The conversion script here seems to say that the model you are trying to convert does not have that layer. I am also not sure transformers
is used as I can't see that in the traceback
I face this problem recently. I found this caused by latest megatron replacing layernorm
to norm
. When i fix it, the code run successful.
if op_name.endswith("norm"):
ln_name = "attention.ln" if op_name.startswith("input") else "ln"
output_state_dict[layer_name + "." + ln_name + "." + weight_or_bias] = val
output_state_dict["bert.encoder.ln.weight"] = transformer["final_norm.weight"]
output_state_dict["bert.encoder.ln.bias"] = transformer["final_norm.bias"]
output_state_dict["cls.predictions.transform.LayerNorm.weight"] = lm_head["norm.weight"]
output_state_dict["cls.predictions.transform.LayerNorm.bias"] = lm_head["norm.bias"]
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers 4.40.0 python 3.10
Who can help?
@ArthurZucker @Narsil @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
If I use another version of Megatron, I encounter another problem:
Expected behavior
Is it due to the version of Megatron? Which version of Megatron should I use to train in order to use this script?