huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.05k stars 26.3k forks source link

convert_megatron_gpt2_checkpoint.py #32486

Open syx11237744 opened 1 month ago

syx11237744 commented 1 month ago

System Info

transformers 4.40.0 python 3.10

Who can help?

@ArthurZucker @Narsil @SunMarc

Information

Tasks

Reproduction

when i use convert_megatron_gpt2_checkpoint,i got 
Traceback (most recent call last):
  File "/home/code/max_cp/convert_m_g.py", line 356, in <module>
    main()
  File "/home/code/max_cp/convert_m_g.py", line 316, in main
    output_state_dict = convert_megatron_checkpoint(args, input_state_dict, config)
  File "/home/code/max_cp/convert_m_g.py", line 215, in convert_megatron_checkpoint
    out_name = megatron_to_transformers[op_name]
KeyError: 'input_norm'

If I use another version of Megatron, I encounter another problem:

Traceback (most recent call last):
  File "/home/sunyuanxu/code/max_cp/convert_m_g.py", line 356, in <module>
    main()
  File "/home/sunyuanxu/code/max_cp/convert_m_g.py", line 316, in main
    output_state_dict = convert_megatron_checkpoint(args, input_state_dict, config)
  File "/home/sunyuanxu/code/max_cp/convert_m_g.py", line 215, in convert_megatron_checkpoint
    out_name = megatron_to_transformers[op_name]
KeyError: 'self_attention.layernorm_qkv'

Expected behavior

Is it due to the version of Megatron? Which version of Megatron should I use to train in order to use this script?

ArthurZucker commented 1 month ago

Hey! Sorry we might need a bit more context, like which checkpoint are you converting? The conversion script here seems to say that the model you are trying to convert does not have that layer. I am also not sure transformers is used as I can't see that in the traceback

XIANYUNYEHE-DEL commented 1 month ago

I face this problem recently. I found this caused by latest megatron replacing layernorm to norm. When i fix it, the code run successful.

if op_name.endswith("norm"):
  ln_name = "attention.ln" if op_name.startswith("input") else "ln"
    output_state_dict[layer_name + "." + ln_name + "." + weight_or_bias] = val

output_state_dict["bert.encoder.ln.weight"] = transformer["final_norm.weight"]
output_state_dict["bert.encoder.ln.bias"] = transformer["final_norm.bias"]

output_state_dict["cls.predictions.transform.LayerNorm.weight"] = lm_head["norm.weight"]
output_state_dict["cls.predictions.transform.LayerNorm.bias"] = lm_head["norm.bias"]
github-actions[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.