convert_llama_weights_to_hf.py - problem converting llama 3.2 1B

raulod commented 1 month ago

System Info

latest transformer python 3.11.5

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

We are using convert_llama_weights_to_hf.py to convert original Llama-3.2-1B to HF but the resulting files are not complete. We successfuly used the same script with 3.1 models.

Below is the commnad line. I passed model size as 8B since 1B is not supported by convert_llama_weights_to_hf.py

python .venv/lib/python3.11/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir Meta-Llama-3.2-1B/ --model_size 8B --output_dir hf --llama_version 3.1

It does convert, however the resulting HF file appears to be incomplete. When further converted generated HF to gguf, the resulting model fails to load with error "missing merges.txt"

We then downloaded same model from HuggingFace website and noticed that it has same safetensors file size but different json files, for example, below is the HF files produces by convert_llama_weights_to_hf.py

-rw-r--r--  1 root root        689 Sep 27 15:21 config.json
-rw-r--r--  1 root root        121 Sep 27 15:21 generation_config.json
-rw-r--r--  1 root root 2471645608 Sep 27 15:21 model.safetensors
-rw-r--r--  1 root root         73 Sep 27 15:21 special_tokens_map.json
-rw-r--r--  1 root root      50526 Sep 27 15:21 tokenizer_config.json
-rw-r--r--  1 root root   17208712 Sep 27 15:21 tokenizer.json

Below is files downloaded from hugging face website

-rw-r--r--  1 root root        843 Sep 27 14:25 config.json
-rw-r--r--  1 root root        186 Sep 27 14:25 generation_config.json
drwxr-xr-x  9 root root        174 Sep 27 14:59 .git
-rw-r--r--  1 root root       1519 Sep 27 14:25 .gitattributes
-rw-r--r--  1 root root       7712 Sep 27 14:25 LICENSE.txt
-rw-r--r--  1 root root 2471645608 Sep 27 14:59 model.safetensors
drwxr-xr-x  2 root root         75 Sep 27 14:59 original
-rw-r--r--  1 root root      35371 Sep 27 14:25 README.md
-rw-r--r--  1 root root        301 Sep 27 14:25 special_tokens_map.json
-rw-r--r--  1 root root      50500 Sep 27 14:25 tokenizer_config.json
-rw-r--r--  1 root root    9085657 Sep 27 14:25 tokenizer.json
-rw-r--r--  1 root root       6021 Sep 27 14:25 USE_POLICY.md

Expected behavior

Generated and downloaded from HuggingFace Files should have been the same.

Mukunda-Gogoi commented 1 month ago

I am facing this problem and due to which i am blocked.

XxSamaxX commented 1 month ago

Hi there,

I've been doing some digging on my own, trying to convert the Vision-Instruct 11B model to later perform quantization to 4 bits. However, I've encountered an issue in the following part of the code:

state_dict = { f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][ f"layers.{layer_i}.attention_norm.weight" ].clone(), f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[0][ f"layers.{layer_i}.ffn_norm.weight" ].clone(), }

The problem seems to arise because the structure of the loaded object doesn’t match what the script is expecting. I’ve attached an image to illustrate this mismatch:

Screenshot from 2024-09-29 20-03-20

This is beyond my current knowledge, but I believe the solution might involve retrieving the weights from specific key layers—or possibly from all layers? I'm not entirely sure, but I hope this information is helpful to the developers.

I’m eagerly awaiting an updated version of the script that supports LLaMA 3.2. Thanks for all your hard work!

LysandreJik commented 1 month ago

cc @ArthurZucker that created the script

ArthurZucker commented 1 month ago

Hey! Few things, we used that script to convert the Llama 3.2 that you can see on the hub. #33778 will fix potential tokenization issues.

Conversion to gguf was also skipped AFAIK (the format of serialization is different since 4.45 -> tokenizer bumped 0.20)

raulod commented 1 month ago

Hey! Few things, we used that script to convert the Llama 3.2 that you can see on the hub. #33778 will fix potential tokenization issues.

Conversion to gguf was also skipped AFAIK (the format of serialization is different since 4.45 -> tokenizer bumped 0.20)

How did HF generated the gguf? it seems to work but not when converted by the script.

ArthurZucker commented 1 month ago

@pcuenca can confirm but I don't think we did! 🤗

raulod commented 1 month ago

@pcuenca can confirm but I don't think we did! 🤗

My mistake, apologies for the confusion. What I meant is that a valid GGUF file can be generated from the safesensor downloaded from HF, as I mentioned in the first post but not when using convert_llama_weights_to_hf.py. So how did HF generated the correct files?

pcuenca commented 1 month ago

Hi @raulod! To convert from the original format to gguf, I'd recommend you follow this process:

Convert from the original checkpoint to transformers using #33778.
Upgrade llama.cpp to the latest version, which includes this PR that addresses the tokenizer serialization changes.

The second step is necessary because transformers 4.45.0 introduced a new serialization format for the tokenizer merges, as Arthur mentioned above. Our conversion to the transformers format was performed before 4.45.0 was out, and hence the checkpoints on the Hub still use the traditional serialization format.

pcuenca commented 1 month ago

@XxSamaxX to convert the Vision models, you need to use this script instead. Those models cannot be converted with the Llama conversion script because they use a different model architecture.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers