Open okwinds opened 2 weeks ago
Hi @okwinds, can you provide the exact sample dataset so we can attempt to reproduce with the Qwen model? The dampening fraction is the correct pathway to trace down for issues like these. Did you test if the model was runnable as is through HuggingFace before vLLM and if it was creating sensible answers? Because it sounds like quantization went through correctly and his may have been a different crash that happened in vLLM and it was unrelated to the cholesky decomposition.
https://github.com/vllm-project/llm-compressor/issues/109 https://github.com/vllm-project/llm-compressor/issues/142
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 17915 is not positive-definite).
dampening_frac
, which allowed me to complete the model quantization, but running the VLLM inference caused the WSL Ubuntu 22.04 to crash.(No exception information was captured)In the end, my solution was to switch back to the older version 0.1.0, which resolved the issue, although the quantization process is quite slow.
datasets: belle_resampled_78_k_cn-train ultrachat_200k open-platypus AI-MO_NuminaMath-CoT
Originally posted by @okwinds in https://github.com/vllm-project/llm-compressor/issues/142#issuecomment-2395811942