Closed welnaseth closed 2 months ago
After running for quite some time, the quantization fails with this error (and continues to fail when trying to resume):
-- Resuming job !! Note: Overriding options with settings from existing job -- Input: /home/seth/Documents/unquantized_models/c4ai-command-r-v01/ -- Output: /tmp/exllama2_quant/ -- Using default calibration dataset -- Target bits per weight: 2.5 (decoder), 6 (head) -- Max shard size: 8192 MB -- Full model will be compiled to: /home/seth/Documents/models/c4ai-command-r-v01-2.5bpw-h6/ -- Quantizing... -- Layer: model.layers.0 (ParallelDecoder) -- Sublayer: model.layers.0.self_attn -- Linear: model.layers.0.self_attn.q_proj -> 0.1:3b_64g/0.9:2b_64g s4, 2.17 bpw Traceback (most recent call last): File "/home/seth/exllamav2/convert.py", line 265, in <module> quant(job, save_job, model) File "/home/seth/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/seth/exllamav2/conversion/quantize.py", line 421, in quant quant_parallel_decoder(job, module, hidden_states, target_states, quantizers, attn_params, strat_attn, strat_mlp) File "/home/seth/exllamav2/conversion/quantize.py", line 212, in quant_parallel_decoder quant_attn(job, module.attn, hidden_states, target_states, quantizers, attn_params, strat_attn) File "/home/seth/exllamav2/conversion/quantize.py", line 139, in quant_attn quant_linear(job, module.q_proj, quantizers["q_proj"], strat["q_proj"]) File "/home/seth/exllamav2/conversion/quantize.py", line 91, in quant_linear recons_linear.load(recons_dict, device_tensors = False) File "/home/seth/exllamav2/exllamav2/linear.py", line 109, in load self.q_handle = ext.make_q_matrix(w, File "/home/seth/exllamav2/exllamav2/ext.py", line 218, in make_q_matrix return ext_c.make_q_matrix(w["q_weight"], TypeError: make_q_matrix(): incompatible function arguments. The following argument types are supported: 1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor, arg6: torch.Tensor, arg7: torch.Tensor, arg8: torch.Tensor, arg9: torch.Tensor, arg10: torch.Tensor, arg11: torch.Tensor) -> int Invoked with: tensor([[ 628275563, 628275563, 615635230, ..., -772086785, 1999216129, -772086273], [ 1227665977, -953904565, 966027833, ..., -815961228, -1326794355, 1599958774], [ 1383706006, 1383703926, -167191670, ..., -168400160, 245227071, -161043519], ..., [ -374686997, -361825574, 1802095277, ..., -1431844182, -1432438086, -336867986], [-1973053062, 1584045498, -1348550934, ..., -1392858727, -1414616134, -1532527223], [-1431655827, -1431656087, -1431654938, ..., -1431658858, -1431656002, 1790619478]], device='cuda:0', dtype=torch.int32), tensor([5103, 2225, 7109, ..., 5805, 2787, 1243], device='cuda:0', dtype=torch.int16), tensor([2942, 1037, 1748, ..., 6279, 4822, 4804], device='cuda:0', dtype=torch.int16), tensor([[ 1954845608, 1413826149, 1412842581, ..., 926311236, 1163158582, 1413825876], [-2052606021, 1701275782, 1413834070, ..., 1179149124, 1450538327, 1700157045], [ 1953797783, 1163150692, 1413825621, ..., 909529907, 1163158598, 1145390420], ..., [ 1971754632, 1733715557, 1700226422, ..., 1162167364, 1163154501, 1144271955], [ 1971754360, 1466328677, 1433761381, ..., 1163220020, 1431589957, 1145390420], [ 2005374889, 2003203958, 1719035240, ..., 1179018325, 1179931718, 1448502613]], device='cuda:0', dtype=torch.int32), tensor([0.0010, 0.0006, 0.0006, 0.0004, 0.0005, 0.0003, 0.0005, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0006, 0.0004, 0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0006, 0.0004, 0.0005, 0.0004, 0.0005, 0.0003, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0003, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0011, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004], device='cuda:0', dtype=torch.float16), tensor([ 3, 0, 3, 6, 3, 12, 3, 18, 3, 24, 3, 30, 3, 36, 3, 42, 3, 48, 3, 54, 3, 60, 3, 66, 3, 72, 2, 78, 2, 82, 2, 86, 2, 90, 2, 94, 2, 98, 2, 102, 2, 106, 2, 110, 2, 114, 2, 118, 2, 122, 2, 126, 2, 130, 2, 134, 2, 138, 2, 142, 2, 146, 2, 150, 2, 154, 2, 158, 2, 162, 2, 166, 2, 170, 2, 174, 2, 178, 2, 182, 2, 186, 2, 190, 2, 194, 2, 198, 2, 202, 2, 206, 2, 210, 2, 214, 2, 218, 2, 222, 2, 226, 2, 230, 2, 234, 2, 238, 2, 242, 2, 246, 2, 250, 2, 254, 2, 258, 2, 262, 2, 266, 2, 270, 2, 274, 2, 278, 2, 282, 2, 286, 2, 290, 2, 294, 2, 298, 2, 302, 2, 306, 2, 310, 2, 314, 2, 318, 2, 322, 2, 326, 2, 330, 2, 334, 2, 338, 2, 342, 2, 346, 2, 350, 2, 354, 2, 358, 2, 362, 2, 366, 2, 370, 2, 374, 2, 378, 2, 382, 2, 386, 2, 390, 2, 394, 2, 398, 2, 402, 2, 406, 2, 410, 2, 414, 2, 418, 2, 422, 2, 426, 2, 430, 2, 434, 2, 438, 2, 442, 2, 446, 2, 450, 2, 454, 2, 458, 2, 462, 2, 466, 2, 470, 2, 474, 2, 478, 2, 482, 2, 486, 2, 490, 2, 494, 2, 498, 2, 502, 2, 506, 2, 510, 2, 514, 2, 518, 2, 522, 2, 526, 2, 530, 2, 534], device='cuda:0', dtype=torch.int16), tensor([ 0, 64, 0, ..., 2, 127, 1], device='cuda:0', dtype=torch.int16), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), 65536
Any idea what might be causing this? I've run git pull so I should have the latest changes.
git pull
Of course right after I post I figure out the issue. I needed to pip install . again after the git pull.
pip install .
After running for quite some time, the quantization fails with this error (and continues to fail when trying to resume):
Any idea what might be causing this? I've run
git pull
so I should have the latest changes.