Closed raulod closed 1 week ago
I am facing this problem and due to which i am blocked.
Hi there,
I've been doing some digging on my own, trying to convert the Vision-Instruct 11B model to later perform quantization to 4 bits. However, I've encountered an issue in the following part of the code:
state_dict = { f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][ f"layers.{layer_i}.attention_norm.weight" ].clone(), f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[0][ f"layers.{layer_i}.ffn_norm.weight" ].clone(), }
The problem seems to arise because the structure of the loaded object doesn’t match what the script is expecting. I’ve attached an image to illustrate this mismatch:
This is beyond my current knowledge, but I believe the solution might involve retrieving the weights from specific key layers—or possibly from all layers? I'm not entirely sure, but I hope this information is helpful to the developers.
I’m eagerly awaiting an updated version of the script that supports LLaMA 3.2. Thanks for all your hard work!
cc @ArthurZucker that created the script
Hey! Few things, we used that script to convert the Llama 3.2 that you can see on the hub. #33778 will fix potential tokenization issues.
Conversion to gguf was also skipped AFAIK (the format of serialization is different since 4.45 -> tokenizer bumped 0.20)
Hey! Few things, we used that script to convert the Llama 3.2 that you can see on the hub. #33778 will fix potential tokenization issues.
Conversion to gguf was also skipped AFAIK (the format of serialization is different since 4.45 -> tokenizer bumped 0.20)
How did HF generated the gguf? it seems to work but not when converted by the script.
@pcuenca can confirm but I don't think we did! 🤗
@pcuenca can confirm but I don't think we did! 🤗
My mistake, apologies for the confusion. What I meant is that a valid GGUF file can be generated from the safesensor downloaded from HF, as I mentioned in the first post but not when using convert_llama_weights_to_hf.py
. So how did HF generated the correct files?
Hi @raulod! To convert from the original format to gguf
, I'd recommend you follow this process:
The second step is necessary because transformers 4.45.0 introduced a new serialization format for the tokenizer merges, as Arthur mentioned above. Our conversion to the transformers format was performed before 4.45.0 was out, and hence the checkpoints on the Hub still use the traditional serialization format.
@XxSamaxX to convert the Vision models, you need to use this script instead. Those models cannot be converted with the Llama conversion script because they use a different model architecture.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
latest transformer python 3.11.5
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
We are using
convert_llama_weights_to_hf.py
to convert original Llama-3.2-1B to HF but the resulting files are not complete. We successfuly used the same script with 3.1 models.Below is the commnad line. I passed model size as 8B since 1B is not supported by
convert_llama_weights_to_hf.py
python .venv/lib/python3.11/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir Meta-Llama-3.2-1B/ --model_size 8B --output_dir hf --llama_version 3.1
It does convert, however the resulting HF file appears to be incomplete. When further converted generated HF to gguf, the resulting model fails to load with error "missing merges.txt"
We then downloaded same model from HuggingFace website and noticed that it has same safetensors file size but different json files, for example, below is the HF files produces by
convert_llama_weights_to_hf.py
Below is files downloaded from hugging face website
Expected behavior
Generated and downloaded from HuggingFace Files should have been the same.