Fix GPTQ Aliases - Githubissues

https://github.com/neuralmagic/compressed-tensors/pull/81 must be merged first

When specifying a scheme preset, the quantization modifier for GPTQ was not being properly initialized. In the example code below, despite specifying a W4A16 scheme the quantization config was always empty: Building quantization modifier with args: {'config_groups': {'config_group_0': QuantizationScheme(targets=['Linear'], weights=None, input_activations=None, output_activations=None)}}

The fix was to update the GPTQ modifier initialization to correctly apply the preset scheme. I've also added unit tests to confirm all variants of the GPTQ recipe are functioning as intended

Example Code

import torch
from datasets import load_dataset
from sparseml.transformers import SparseAutoModelForCausalLM, oneshot
from sparseml.modifiers.quantization.gptq import GPTQModifier
from transformers import AutoTokenizer

NUM_CALIBRATION_SAMPLES = 16
MAX_SEQ_LEN = 2048
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

gptq = GPTQModifier(
    scheme={"W4A16": ["Linear"]}
)

ds = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
ds = ds.map(lambda batch: {"text": tokenizer.apply_chat_template(batch["messages"], tokenize=False)})

oneshot(
    model=model,
    dataset=ds,
    recipe=gptq,
    max_seq_length=MAX_SEQ_LEN,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

neuralmagic / sparseml

Fix GPTQ Aliases #2327

Example Code