Closed moonlightian closed 1 week ago
Hi @moonlightian!
The issue you are encountering is related to the default mapping used by the SmoothQuantModifier
. For your case, you'll need to provide custom mappings
specific to the BLOOM architecture.
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.transformers import oneshot
# modify recipe as needed
recipe = """
quant_stage:
quant_modifiers:
SmoothQuantModifier:
smoothing_strength: 0.5
mappings: [
[["re:.*query_key_value"], "re:.*input_layernorm"],
[["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"]
]
ignore: ["model.decoder.final_layer_norm"]
GPTQModifier:
dampening_frac: 0.1
scheme: "W8A8"
targets: ["Linear"]
ignore: ["lm_head"]
"""
# remember to substitute for your own datasets
oneshot(
model="bigscience/bloom-560m",
dataset="open_platypus",
recipe=recipe,
output_dir="outdir",
max_seq_length=64,
num_calibration_samples=64,
)
It seems that bloom is not support for quantization right now