Error converting from PyTorch to HuggingFace - Mistral / Mixtral

efenocchi commented 6 months ago

System Info

A100 40GB RAM 32GB

Who can help?

@ArthurZucker, @younesbelkada

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

run the transformers\src\transformers\models\mistral\convert_mistral_weights_to_hf.py file.

Expected behavior

Hi @ArthurZucker, @younesbelkada,

I fine-tuned Mistral with Torchtune and I was trying to convert the weights from PyTorch to HF but I got an error and saw a couple of other things that could lead to other errors. (file src\transformers\models\mistral\convert_mistral_weights_to_hf.py)

The first is in line 137

state_dict = {
          f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][
              f"layers.{layer_i}.attention_norm.weight"
          ].clone(),
          f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[0][
              f"layers.{layer_i}.ffn_norm.weight"
          ].clone(),

the problem is that post_attention_norm.weight doesn't exist in the fine-tuned model_*.pt file (not even in the original one) , I think it could be changed with post_attention_layernorm.weight.

I also saw that in the configuration_mistral.py file 2 old repositories are suggested, which are no longer available (here instead of here).

    [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
    [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)

The NUM_SHARDS defaults to 1 for the 7B model and is the only one available NUM_SHARDS = {"7B": 1}, I think it would be correct to set the loading to two checkpoints by default, which the user can change to 3 when using the instruct version.

I think there are the same errors in the mixtral folder.

I'm sorry but not having adequate GPUs available I struggle to debug and fix these problems and via notebook it's quite frustrating.

Thank you for your time and efforts.

Emanuele

ArthurZucker commented 6 months ago

There is probably a missunderstanding, post_attention_layernorm is for the HF format: https://github.com/NTHU-ML-2023-team19/transformers/blob/1b61f800fc0e56ffee8428dd65e451279526b534/src/transformers/models/mistral/modeling_mistral.py#L683

ArthurZucker commented 6 months ago

Can you share a traceback, of your issue? Here I don't see any problem and the script has been around for quite a while. It was not designed for post torch-tune tho.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers