Open carrot-o0o opened 1 month ago
Thanks @carrot-o0o! We're in the process of revamping our sparsity pathways and I'll ensure this fix gets included and tested.
I ran the example as described with 2:4_w4a16_group-128_recipe.yaml
and did not encounter an error. I believe this was fixed by #80
Describe the bug This is a minor issue, but I think the quantization configuration in the file
[examples/quantization_24_sparse_w4a16/2:4_w4a16_group-128_recipe.yaml]
(https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_24_sparse_w4a16/2%3A4_w4a16_group-128_recipe.yaml) should includeignore: ["lm_head"]
like below. Otherwise, during saving the quantized model, the code results in aValueError
caused bycompressed_tensors
because the lm_head doesn't follow the 2:4 sparse pattern.Expected behavior A clear and concise description of what you expected to happen.
Environment Include all relevant environment information:
f7245c8
]: 7a0d23294c6cdfe0a17f9f21cb1bc20d9b9e3cd7To Reproduce Exact steps to reproduce the behavior: i ran
python examples/quantization_24_sparse_w4a16/llama7b_sparse_w4a16.py
, where I changed the model path to another Llama model and the recipe path to2:4_w4a16_group-128_recipe.yaml
Errors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
The Error without ignore lm_head:
Additional context Add any other context about the problem here. Also include any relevant files.