turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

TypeError: make_q_matrix(): incompatible function arguments qhen quantizing Cohere Command R v0.1 #405

Closed welnaseth closed 2 months ago

welnaseth commented 2 months ago

After running for quite some time, the quantization fails with this error (and continues to fail when trying to resume):

 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: /home/seth/Documents/unquantized_models/c4ai-command-r-v01/
 -- Output: /tmp/exllama2_quant/
 -- Using default calibration dataset
 -- Target bits per weight: 2.5 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Full model will be compiled to: /home/seth/Documents/models/c4ai-command-r-v01-2.5bpw-h6/
 -- Quantizing...
 -- Layer: model.layers.0 (ParallelDecoder)
 -- Sublayer: model.layers.0.self_attn
 -- Linear: model.layers.0.self_attn.q_proj -> 0.1:3b_64g/0.9:2b_64g s4, 2.17 bpw
Traceback (most recent call last):
  File "/home/seth/exllamav2/convert.py", line 265, in <module>
    quant(job, save_job, model)
  File "/home/seth/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/seth/exllamav2/conversion/quantize.py", line 421, in quant
    quant_parallel_decoder(job, module, hidden_states, target_states, quantizers, attn_params, strat_attn, strat_mlp)
  File "/home/seth/exllamav2/conversion/quantize.py", line 212, in quant_parallel_decoder
    quant_attn(job, module.attn, hidden_states, target_states, quantizers, attn_params, strat_attn)
  File "/home/seth/exllamav2/conversion/quantize.py", line 139, in quant_attn
    quant_linear(job, module.q_proj, quantizers["q_proj"], strat["q_proj"])
  File "/home/seth/exllamav2/conversion/quantize.py", line 91, in quant_linear
    recons_linear.load(recons_dict, device_tensors = False)
  File "/home/seth/exllamav2/exllamav2/linear.py", line 109, in load
    self.q_handle = ext.make_q_matrix(w,
  File "/home/seth/exllamav2/exllamav2/ext.py", line 218, in make_q_matrix
    return ext_c.make_q_matrix(w["q_weight"],
TypeError: make_q_matrix(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor, arg6: torch.Tensor, arg7: torch.Tensor, arg8: torch.Tensor, arg9: torch.Tensor, arg10: torch.Tensor, arg11: torch.Tensor) -> int

Invoked with: tensor([[  628275563,   628275563,   615635230,  ...,  -772086785,
          1999216129,  -772086273],
        [ 1227665977,  -953904565,   966027833,  ...,  -815961228,
         -1326794355,  1599958774],
        [ 1383706006,  1383703926,  -167191670,  ...,  -168400160,
           245227071,  -161043519],
        ...,
        [ -374686997,  -361825574,  1802095277,  ..., -1431844182,
         -1432438086,  -336867986],
        [-1973053062,  1584045498, -1348550934,  ..., -1392858727,
         -1414616134, -1532527223],
        [-1431655827, -1431656087, -1431654938,  ..., -1431658858,
         -1431656002,  1790619478]], device='cuda:0', dtype=torch.int32), tensor([5103, 2225, 7109,  ..., 5805, 2787, 1243], device='cuda:0',
       dtype=torch.int16), tensor([2942, 1037, 1748,  ..., 6279, 4822, 4804], device='cuda:0',
       dtype=torch.int16), tensor([[ 1954845608,  1413826149,  1412842581,  ...,   926311236,
          1163158582,  1413825876],
        [-2052606021,  1701275782,  1413834070,  ...,  1179149124,
          1450538327,  1700157045],
        [ 1953797783,  1163150692,  1413825621,  ...,   909529907,
          1163158598,  1145390420],
        ...,
        [ 1971754632,  1733715557,  1700226422,  ...,  1162167364,
          1163154501,  1144271955],
        [ 1971754360,  1466328677,  1433761381,  ...,  1163220020,
          1431589957,  1145390420],
        [ 2005374889,  2003203958,  1719035240,  ...,  1179018325,
          1179931718,  1448502613]], device='cuda:0', dtype=torch.int32), tensor([0.0010, 0.0006, 0.0006, 0.0004, 0.0005, 0.0003, 0.0005, 0.0002, 0.0002,
        0.0003, 0.0002, 0.0002, 0.0002, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004,
        0.0006, 0.0004, 0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004, 0.0005,
        0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004,
        0.0005, 0.0004, 0.0004, 0.0005, 0.0006, 0.0004, 0.0005, 0.0004, 0.0005,
        0.0003, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004,
        0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004,
        0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004,
        0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005,
        0.0003, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005,
        0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004,
        0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0011, 0.0004, 0.0004,
        0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004,
        0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004,
        0.0004, 0.0004], device='cuda:0', dtype=torch.float16), tensor([  3,   0,   3,   6,   3,  12,   3,  18,   3,  24,   3,  30,   3,  36,
          3,  42,   3,  48,   3,  54,   3,  60,   3,  66,   3,  72,   2,  78,
          2,  82,   2,  86,   2,  90,   2,  94,   2,  98,   2, 102,   2, 106,
          2, 110,   2, 114,   2, 118,   2, 122,   2, 126,   2, 130,   2, 134,
          2, 138,   2, 142,   2, 146,   2, 150,   2, 154,   2, 158,   2, 162,
          2, 166,   2, 170,   2, 174,   2, 178,   2, 182,   2, 186,   2, 190,
          2, 194,   2, 198,   2, 202,   2, 206,   2, 210,   2, 214,   2, 218,
          2, 222,   2, 226,   2, 230,   2, 234,   2, 238,   2, 242,   2, 246,
          2, 250,   2, 254,   2, 258,   2, 262,   2, 266,   2, 270,   2, 274,
          2, 278,   2, 282,   2, 286,   2, 290,   2, 294,   2, 298,   2, 302,
          2, 306,   2, 310,   2, 314,   2, 318,   2, 322,   2, 326,   2, 330,
          2, 334,   2, 338,   2, 342,   2, 346,   2, 350,   2, 354,   2, 358,
          2, 362,   2, 366,   2, 370,   2, 374,   2, 378,   2, 382,   2, 386,
          2, 390,   2, 394,   2, 398,   2, 402,   2, 406,   2, 410,   2, 414,
          2, 418,   2, 422,   2, 426,   2, 430,   2, 434,   2, 438,   2, 442,
          2, 446,   2, 450,   2, 454,   2, 458,   2, 462,   2, 466,   2, 470,
          2, 474,   2, 478,   2, 482,   2, 486,   2, 490,   2, 494,   2, 498,
          2, 502,   2, 506,   2, 510,   2, 514,   2, 518,   2, 522,   2, 526,
          2, 530,   2, 534], device='cuda:0', dtype=torch.int16), tensor([  0,  64,   0,  ...,   2, 127,   1], device='cuda:0',
       dtype=torch.int16), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), 65536

Any idea what might be causing this? I've run git pull so I should have the latest changes.

welnaseth commented 2 months ago

Of course right after I post I figure out the issue. I needed to pip install . again after the git pull.