Closed 1PercentSync closed 7 months ago
This usually happens when there's a version mismatch between ExLlama's C++ extension and the version of ExLlama you're actually using.
The latest release version, which you appear to have installed, is 0.0.13.post2, and Qwen support was added after that (I'm assuming the model you're trying to convert is a Qwen model). You'll have to build from source or wait for the 0.0.14 release, which should be soon. To build from source:
pip uninstall exllamav2
pip install .
This usually happens when there's a version mismatch between ExLlama's C++ extension and the version of ExLlama you're actually using.
The latest release version, which you appear to have installed, is 0.0.13.post2, and Qwen support was added after that (I'm assuming the model you're trying to convert is a Qwen model). You'll have to build from source or wait for the 0.0.14 release, which should be soon. To build from source:
pip uninstall exllamav2 pip install .
I encountered an error when trying to build from the source code, but I forked the project and used action to build it. Now it's working again. Thank you for your response.
Environment
Issue Description
During the quantization process of a model (https://huggingface.co/CausalLM/7B) using exllamav2, I encountered a
TypeError
in themake_q_matrix
function.Steps to Reproduce
access_token = "tokenxxx" tokenizer = AutoTokenizer.from_pretrained("CausalLM/7B", token=access_token) model = AutoModelForCausalLM.from_pretrained("CausalLM/7B", token=access_token)
save_directory = "D:/Github/7B/hfc" tokenizer.save_pretrained(save_directory) model.save_pretrained(save_directory)
python convert.py -i D:\Github\7B\hfc -o D:\Github\7B\exl -cf D:\Github\7B\exlo -b 4.0
-- Quantizing... -- Layer: model.layers.0 (Attention) -- Linear: model.layers.0.self_attn.q_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.23 bpw Traceback (most recent call last): File "D:\Portable Program Files\Exllama\convert.py", line 253, in
quant(job, save_job, model)
File "D:\Portable Program Files\Exllama\venv\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Portable Program Files\Exllama\conversion\quantize.py", line 329, in quant
quant_attn(job, module, hidden_states, target_states, quantizers, cache, attn_params, strat)
File "D:\Portable Program Files\Exllama\conversion\quantize.py", line 124, in quant_attn
quant_linear(job, module.q_proj, quantizers["q_proj"], strat["q_proj"])
File "D:\Portable Program Files\Exllama\conversion\quantize.py", line 80, in quant_linear
recons_linear.load(recons_dict)
File "D:\Portable Program Files\Exllama\exllamav2\linear.py", line 55, in load
self.q_handle = ext.make_q_matrix(w, self.temp_dq)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Portable Program Files\Exllama\exllamav2\ext.py", line 210, in make_q_matrix
return ext_c.make_q_matrix(w["q_weight"],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: make_q_matrix(): incompatible function arguments. The following argument types are supported:
Invoked with: tensor([[ -317331462, 1634642985, -1583586337, ..., -351809069, 380457976, 450481114], [-1958358436, -1563045924, -1464445553, ..., -1705319801, -372846963, -389099899], [ 1064008813, -1026139110, 846028031, ..., 1122602223, 1039037280, 1045457954], ..., [ 2053671334, -962025128, 2089217003, ..., 769383992, -923224007, -373887588], [ 2022178344, -1735944105, 1720110888, ..., 2054450329, -1717007240, -1717007513], [ 1989703833, -1688688506, 1704434615, ..., -1735870346, -1989625178, -1988560219]], device='cuda:0', dtype=torch.int32), tensor([1503, 237, 3489, ..., 3461, 962, 3195], device='cuda:0', dtype=torch.int16), tensor([1117, 4037, 653, ..., 2500, 2814, 2507], device='cuda:0', dtype=torch.int16), tensor([[-1970767208, 1736927334, -1198823514, ..., -2053809785, -1771604359, -1721206939], [ -927163670, -1749386361, -1985373992, ..., 2021025926, 2037881209, -2006353529], [-1716745544, -2021103241, -2004191305, ..., 1986418824, 2019985256, 2002286950], ..., [-1464296743, -1985377913, -1986422889, ..., 2004318377, -1987536775, -1735812454], [-1194628408, 2040101479, -2021099640, ..., 1986422648, 2005370744, 1736931175], [-1465210133, -1951749752, -1968662343, ..., 1753844104, -2021161081, -1988523641]], device='cuda:0', dtype=torch.int32), tensor([1.4985e-04, 1.0347e-04, 1.0413e-04, 9.6381e-05, 9.4593e-05, 7.9751e-05, 9.6023e-05, 8.0705e-05, 7.8022e-05, 1.0461e-04, 8.8155e-05, 9.9063e-05, 9.4473e-05, 1.6248e-04, 1.9956e-04, 2.3210e-04, 1.9240e-04, 1.8895e-04, 1.5152e-04, 1.7917e-04, 1.3471e-04, 1.8966e-04, 1.5247e-04, 2.1207e-04, 1.8322e-04, 1.4448e-04, 1.4055e-04, 1.8167e-04, 1.6332e-04, 1.8728e-04, 1.8990e-04, 1.7083e-04, 1.3828e-04, 1.4675e-04, 1.7118e-04, 1.4520e-04, 1.4150e-04, 1.5342e-04, 1.7571e-04, 1.7285e-04, 1.9681e-04, 1.8334e-04, 1.7738e-04, 1.6725e-04, 1.3447e-04, 1.5736e-04, 1.8930e-04, 1.3983e-04, 1.3816e-04, 1.5295e-04, 1.5652e-04, 1.9932e-04, 1.8632e-04, 2.0337e-04, 1.8811e-04, 1.3268e-04, 1.5378e-04, 1.4925e-04, 1.4031e-04, 1.3614e-04, 1.6689e-04, 1.3101e-04, 1.3638e-04, 1.6689e-04, 1.4853e-04, 1.3840e-04, 1.5950e-04, 1.6057e-04, 1.7571e-04, 1.3435e-04, 1.7631e-04, 1.9944e-04, 1.5175e-04, 1.6367e-04, 1.4675e-04, 1.3506e-04, 1.6403e-04, 1.2612e-04, 1.5211e-04, 1.5163e-04, 1.7297e-04, 1.3137e-04, 1.3852e-04, 1.8013e-04, 1.8096e-04, 1.7011e-04, 1.2207e-04, 1.4293e-04, 1.3673e-04, 1.4770e-04, 1.5461e-04, 1.5700e-04, 1.5223e-04, 1.4746e-04, 1.5974e-04, 1.5044e-04, 1.3125e-04, 1.5330e-04, 1.6236e-04, 1.3566e-04, 1.6785e-04, 1.1849e-04, 1.4615e-04, 1.5748e-04, 1.4389e-04, 1.3697e-04, 1.7250e-04, 1.6499e-04, 1.5664e-04, 1.3304e-04, 1.2720e-04, 1.7822e-04, 1.6141e-04, 1.7834e-04, 1.3244e-04, 1.8013e-04, 1.0818e-04, 1.2743e-04, 1.1837e-04, 1.5020e-04, 1.2362e-04, 1.3781e-04, 1.5330e-04, 1.6141e-04, 1.2648e-04, 1.2624e-04, 1.6856e-04, 1.1939e-04], device='cuda:0', dtype=torch.float16), tensor([ 5, 0, 5, 5, 5, 10, 5, 15, 5, 20, 5, 25, 5, 30, 5, 35, 5, 40, 5, 45, 5, 50, 5, 55, 5, 60, 4, 65, 4, 69, 4, 73, 4, 77, 4, 81, 4, 85, 4, 89, 4, 93, 4, 97, 4, 101, 4, 105, 4, 109, 4, 113, 4, 117, 4, 121, 4, 125, 4, 129, 4, 133, 4, 137, 4, 141, 4, 145, 4, 149, 4, 153, 4, 157, 4, 161, 4, 165, 4, 169, 4, 173, 4, 177, 4, 181, 4, 185, 4, 189, 4, 193, 4, 197, 4, 201, 4, 205, 4, 209, 4, 213, 4, 217, 4, 221, 4, 225, 4, 229, 4, 233, 4, 237, 4, 241, 4, 245, 4, 249, 4, 253, 4, 257, 4, 261, 4, 265, 4, 269, 4, 273, 4, 277, 4, 281, 4, 285, 4, 289, 4, 293, 4, 297, 4, 301, 4, 305, 4, 309, 4, 313, 4, 317, 4, 321, 4, 325, 4, 329, 4, 333, 4, 337, 4, 341, 4, 345, 4, 349, 4, 353, 4, 357, 4, 361, 4, 365, 4, 369, 4, 373, 4, 377, 4, 381, 4, 385, 4, 389, 4, 393, 4, 397, 4, 401, 4, 405, 4, 409, 4, 413, 4, 417, 4, 421, 4, 425, 4, 429, 4, 433, 4, 437, 4, 441, 4, 445, 4, 449, 4, 453, 4, 457, 4, 461, 4, 465, 4, 469, 4, 473, 4, 477, 4, 481, 4, 485, 4, 489, 4, 493, 4, 497, 4, 501, 4, 505, 4, 509, 4, 513, 4, 517, 4, 521], device='cuda:0', dtype=torch.int16), tensor([ 0, 32, 0, ..., 2, 127, 1], device='cuda:0', dtype=torch.int16), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor([0., 0., 0., ..., 0., 0., 0.], device='cuda:0', dtype=torch.float16)