intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.23k stars 257 forks source link

Fix `opt_125m_woq_gptq_int4_dq_ggml` issue #1965

Closed Kaihui-intel closed 3 months ago

Kaihui-intel commented 3 months ago

Type of Change

bug fix

Description

issue

  File "/home/sdp/miniforge3/envs/pytorch-2.3.0+cpu-3.10-spr/lib/python3.10/site-packages/neural_compressor/torch/algorithms/weight_only/gptq.py", line 813, in fasterquant
    H = torch.linalg.cholesky(H)
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

solution: increase percentage of damp

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

chensuyue commented 3 months ago

Will update refer accuracy.