intel / auto-round

Advanced Quantization Algorithm for LLMs/VLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
261 stars 22 forks source link

add pile calib, rename quant_block_list to to_quant_block_names #322

Closed WeiweiZhang1 closed 2 weeks ago

wenhuach21 commented 2 weeks ago

could fp layer also support fuzzy matching

wenhuach21 commented 2 weeks ago
fp_layers = args.fp_layers.split(",")
if bool(fp_layers):
    for n, m in model.named_modules():
        if isinstance(m, torch.nn.Linear) or isinstance(m, transformers.modeling_utils.Conv1D):
            name = n.split('.')[-1]
            if n in fp_layers or name in fp_layers:
                layer_config[n] = {"bits": 16}
                logger.info(
                    f"{n} will not be quantized.")

why coding like this , name = n.split('.')[-1]? how to exclude a exact layer