turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

Quantization error "Warning: Applied additional damping" and "Hessian" #201

Closed yamosin closed 7 months ago

yamosin commented 7 months ago
 -- Quantizing...
 -- Layer: model.layers.0 (Attention)
 -- Linear: model.layers.0.self_attn.q_proj -> 0.25:4b_32g/0.75:2b_32g s4, 2.63 bpw
 -- Linear: model.layers.0.self_attn.k_proj -> 0.05:3b_32g/0.95:2b_32g s4, 2.19 bpw
 -- Linear: model.layers.0.self_attn.v_proj -> 0.2:6b_32g/0.8:3b_32g s4, 3.74 bpw
 -- Linear: model.layers.0.self_attn.o_proj -> 0.05:3b_32g/0.95:2b_32g s4, 2.19 bpw
 -- Layer rfn_error: 0.012080
 -- Module quantized, time: 17.02 seconds
 -- Layer: model.layers.0 (MLP)
 -- Linear: model.layers.0.mlp.gate_proj -> 1.0:4b_128g s4, 4.03 bpw
 -- Linear: model.layers.0.mlp.up_proj -> 1.0:4b_128g s4, 4.03 bpw
 -- Linear: model.layers.0.mlp.down_proj -> 0.1:4b_32g/0.9:3b_32g s4, 3.23 bpw
 -- Layer rfn_error: 0.024482
 -- Module quantized, time: 9.09 seconds
 -- Layer: model.layers.1 (Attention)
 -- Linear: model.layers.1.self_attn.q_proj -> 0.05:3b_32g/0.95:2b_32g s4, 2.19 bpw
 -- Linear: model.layers.1.self_attn.k_proj -> 0.05:3b_32g/0.95:2b_32g s4, 2.19 bpw
 -- Linear: model.layers.1.self_attn.v_proj -> 0.2:6b_32g/0.8:3b_32g s4, 3.74 bpw
 -- Linear: model.layers.1.self_attn.o_proj -> 1.0:4b_32g s4, 4.13 bpw
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
 !! Warning: Applied additional damping
Traceback (most recent call last):
  File "D:\exllamav2\conversion\adaptivegptq.py", line 257, in prepare
    hessian_inv = torch.linalg.cholesky(hessian)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\exllamav2\convert.py", line 300, in <module>
    quant(job, save_job, model)
  File "D:\exllamav2\conversion\quantize.py", line 639, in quant
    do_quant(module.o_proj, quantizers["o_proj"], qparams[module.o_proj.key], job)
  File "D:\exllamav2\conversion\quantize.py", line 439, in do_quant
    if not skip_prep: lq.prepare()
                      ^^^^^^^^^^^^
  File "D:\exllamav2\conversion\adaptivegptq.py", line 288, in prepare
    raise ValueError("Hessian is not invertible")
ValueError: Hessian is not invertible

I'm just getting into quantization and can't found similar issue, any help?

fgdfgfthgr-fox commented 7 months ago

Could you list your device configuration and what model are you trying to quant?

yamosin commented 7 months ago

Could you list your device configuration and what model are you trying to quant?

Well, it looks like just a random issue with no sense, I dont change anything and exactly same command, sometimes it will process 10 minutes and jump same error, and sometimes it will work fine I used E5 2676V3+x99motherboard+16G ddr4 2133+4*3090, model is tinyllama 1.1B