huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.22k stars 27.06k forks source link

Error converting from PyTorch to HuggingFace - Mistral / Mixtral #30641

Closed efenocchi closed 4 months ago

efenocchi commented 6 months ago

System Info

A100 40GB RAM 32GB

Who can help?

@ArthurZucker, @younesbelkada

Information

Tasks

Reproduction

run the transformers\src\transformers\models\mistral\convert_mistral_weights_to_hf.py file.

Expected behavior

Hi @ArthurZucker, @younesbelkada,

I fine-tuned Mistral with Torchtune and I was trying to convert the weights from PyTorch to HF but I got an error and saw a couple of other things that could lead to other errors. (file src\transformers\models\mistral\convert_mistral_weights_to_hf.py)

    [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
    [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)

The NUM_SHARDS defaults to 1 for the 7B model and is the only one available NUM_SHARDS = {"7B": 1}, I think it would be correct to set the loading to two checkpoints by default, which the user can change to 3 when using the instruct version.

I think there are the same errors in the mixtral folder.

I'm sorry but not having adequate GPUs available I struggle to debug and fix these problems and via notebook it's quite frustrating.

Thank you for your time and efforts.

Emanuele

ArthurZucker commented 6 months ago

There is probably a missunderstanding, post_attention_layernorm is for the HF format: https://github.com/NTHU-ML-2023-team19/transformers/blob/1b61f800fc0e56ffee8428dd65e451279526b534/src/transformers/models/mistral/modeling_mistral.py#L683

ArthurZucker commented 6 months ago

Can you share a traceback, of your issue? Here I don't see any problem and the script has been around for quite a while. It was not designed for post torch-tune tho.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.