Closed efenocchi closed 4 months ago
There is probably a missunderstanding, post_attention_layernorm
is for the HF format: https://github.com/NTHU-ML-2023-team19/transformers/blob/1b61f800fc0e56ffee8428dd65e451279526b534/src/transformers/models/mistral/modeling_mistral.py#L683
Can you share a traceback, of your issue? Here I don't see any problem and the script has been around for quite a while. It was not designed for post torch-tune tho.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
A100 40GB RAM 32GB
Who can help?
@ArthurZucker, @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
run the
transformers\src\transformers\models\mistral\convert_mistral_weights_to_hf.py
file.Expected behavior
Hi @ArthurZucker, @younesbelkada,
I fine-tuned Mistral with Torchtune and I was trying to convert the weights from PyTorch to HF but I got an error and saw a couple of other things that could lead to other errors. (file src\transformers\models\mistral\convert_mistral_weights_to_hf.py)
The first is in line 137
the problem is that
post_attention_norm.weight
doesn't exist in the fine-tuned model_*.pt file (not even in the original one) , I think it could be changed withpost_attention_layernorm.weight
.I also saw that in the configuration_mistral.py file 2 old repositories are suggested, which are no longer available (here instead of here).
The
NUM_SHARDS
defaults to 1 for the 7B model and is the only one availableNUM_SHARDS = {"7B": 1}
, I think it would be correct to set the loading to two checkpoints by default, which the user can change to 3 when using the instruct version.I think there are the same errors in the mixtral folder.
I'm sorry but not having adequate GPUs available I struggle to debug and fix these problems and via notebook it's quite frustrating.
Thank you for your time and efforts.
Emanuele