GPTQ Algorithm Cleanup - Githubissues

kylesayrs commented 2 months ago

Purpose

Clean up implementation for easier reading (comments, better structure)
Allow the algorithm to be skipped if the layer is not being targeted
Fix bug where layer is not frozen after QuantizationModifier
Prevent weight observer misuse
Depreciate weight_fake_quant use case

Changes

ensure that freeze_quantization is True (default), even if QuantizationModifier is wrapped by GPTQModifier
implement get_attr_chain helper function to be used for getting chained attributes
use get_attr_chain to get weight quantization arguments and skip computation if weight does not have valid args
directly use memoryless observer to avoid misuse with unsupported observers
perform transpose and float conversion in place to reduce memory use
break out logging operations to separate function
remove weight_fake_quant cases

Testing

Regression tested saving, loading, and vllm inferencing with group quantized model

kylesayrs commented 2 months ago

@Satrat Can you specify what you're looking for in a skip test?

Satrat commented 2 months ago

@Satrat Can you specify what you're looking for in a skip test?

You could just initialize a module with some modules skipped (more than the lm_head) and others quantized, then search the logs for the debug string, or just testing your getattr_chain helper function directly on the model would be fine too

kylesayrs commented 2 months ago

Yeah the failing base test is because of a bug from the previous release which I fixed in the main branch See: https://github.com/neuralmagic/compressed-tensors/blame/4b214e582c8434733efea79239cfadec9358b7fb/src/compressed_tensors/quantization/observers/base.py#L165-L167

kylesayrs commented 2 months ago

Using my local machine and the main branch of compressed_tensors, I confirmed that the tests/llmcompressor/modifiers/ and tests/llmcompressor/transformers/compression/ are passing

vllm-project / llm-compressor

GPTQ Algorithm Cleanup #120

Purpose

Changes

Testing