gguf dequantize failed - Githubissues

PenutChen commented 1 month ago

System Info

transformers==4.42.3 torch==2.3.0

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

The example usage from doc:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

Expected behavior

Produce the following error:

Converting and de-quantizing GGUF tensors...:   0%|                         | 0/201 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data2/Penut/LLM-Backend/hello.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3583, in from_pretrained
    state_dict = load_gguf_checkpoint(gguf_path, return_tensors=True)["tensors"]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 146, in load_gguf_checkpoint
    weights = load_dequant_gguf_tensor(shape=shape, ggml_type=tensor.tensor_type, data=tensor.data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 499, in load_dequant_gguf_tensor
    values = dequantize_q6_k(data)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 284, in dequantize_q6_k
    data_f16 = np.frombuffer(data, dtype=np.float16).reshape(num_blocks, block_size // 2)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 26880000 into shape (152,105)

amyeroberts commented 1 month ago

cc @SunMarc

PenutChen commented 1 month ago

~~The correct workaround is to replace num_blocks in this code with -1, but I'm not sure if this is the correct behavior.~~

# transformers/integrations/ggml.py

def dequantize_q6_k(data):
    block_size = GGML_BLOCK_SIZES["Q6_K"]
    num_blocks = len(data) // block_size

    data_f16 = np.frombuffer(data, dtype=np.float16).reshape(-1, block_size // 2)
    data_u8 = np.frombuffer(data, dtype=np.uint8).reshape(-1, block_size)
    data_i8 = np.frombuffer(data, dtype=np.int8).reshape(-1, block_size)

    scales = data_f16[:, -1].reshape(-1, 1).astype(np.float32)

SunMarc commented 1 month ago

Hey @PenutChen thanks for opening the issue ! I tried your snippet on the main branch of transformers and on v4.42.3, and everything looks fine ! I suggest you to clear your cache and try it again. Also, which version of numpy are you using ? Maybe this is an issue with the 2.0 version was released recently.

PenutChen commented 1 month ago

@SunMarc Thanks for the reply! I upgraded the numpy version to 1.26.4, but I still get the same error. After checking all my dependencies, I found that my gguf was installed from the source of the llama.cpp repo. I changed the version to the PyPI one, and it works!

SunMarc commented 1 month ago

Thanks for investigating ! Hopefully, for the next release of gguf, we won't have the issue you experienced.

PenutChen commented 1 month ago

The latest release of the gguf package is from Dec 13, 2023, but the gguf source still updates frequently. There are some incompatible settings between them. For anyone experiencing this issue, try the following commands:

pip install gguf==0.6.0 "numpy<2.0" --force-reinstall

PenutChen commented 1 month ago

Hi @SunMarc, just a reminder that gguf-py has been updated to 0.9.1 recently. There might be some issues with this version. If I find anything new, I will reopen this issue.

SunMarc commented 1 month ago

Hi @PenutChen, thanks for the warning ! It looks like we indeed have failing tests on side. We get the same error you experienced. I will reopen the issue =)

gelbartm commented 1 month ago

downgrading to gguf==0.6.0 solved it for me. Thanks for @PenutChen hint.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

PenutChen commented 1 week ago

solved by #32298

huggingface / transformers

gguf dequantize failed #31725

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior