neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.01k stars 140 forks source link

AutoGPTQ state dict converter #2315

Closed rahul-tuli closed 2 weeks ago

rahul-tuli commented 4 weeks ago

PR Description

This pull request introduces the following enhancements:

  1. BaseConverter for Transforming Model Checkpoints:

    • A new BaseConverter class has been added to facilitate the transformation of model checkpoints.
  2. ExllamaToCompressedTensorConverter:

    • This new converter transforms an AutoGPTQ Exllama checkpoint into the CompressedTensors format, making it loadable in SparseAutoModel classes.

Test Code

Below is an example of how to use the ExllamaToCompressedTensorConverter:

Test Code:

from sparseml.utils.pytorch import ExllamaToCompressedTensorConverter

def local_test():
    autogptq_model_path: str = "/network/rahul/tinyllama_1b_test_w4a16"
    new_path = ExllamaToCompressedTensorConverter.convert_from_safetensors(
        autogptq_model_path, save_dir="local/models/compressed_tensor_equi"
    )

    from sparseml.transformers import SparseAutoModelForCausalLM
    model = SparseAutoModelForCausalLM.from_pretrained(new_path)

local_test()

Output:

python local/investigation.py
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "RegistryMixin"; 
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "SparsityCompressionConfig"; 
  warnings.warn(
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Loading file: /network/rahul/tinyllama_1b_test_w4a16/model.safetensors
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Applying transformations...
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS complete
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_EXLLAMA_NAMES
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_EXLLAMA_NAMES complete
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/quantize_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/special_tokens_map.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.model to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/recipe.yaml to local/models/compressed_tensor_equi
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Updating quantization config...
2024-06-07 14:34:50 sparseml.transformers.sparsification.sparse_model WARNING  The dtype of the loaded model: torch.float32 is different from from the dtype specified in the model config: torch.float16.To load the model in the format that it was previously saved in, set torch_dtype=`auto` in the SparseAutoModel creation call.
2024-06-07 14:34:50 sparseml.transformers.utils.helpers INFO     Found recipe in the model_path: local/models/compressed_tensor_equi/recipe.yaml
Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.logger.logger INFO     Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.recipe.recipe INFO     Loading recipe from file local/models/compressed_tensor_equi/recipe.yaml
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base WARNING  GPTQ quantization is set to True without an active quantization modifier.
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base INFO     Building quantization modifier with args: {'config_groups': {'group_0': QuantizationScheme(targets=['Linear'], weights=QuantizationArgs(num_bits=4, type=<QuantizationType.INT: 'int'>, symmetric=True, group_size=128, strategy=<QuantizationStrategy.GROUP: 'group'>, block_structure=None, dynamic=False, observer='minmax', observer_kwargs={}), input_activations=None, output_activations=None)}, 'ignore': ['lm_head', 'Embedding']}
manager stage: Model structure initialized
2024-06-07 14:34:50 sparseml.pytorch.model_load.helpers INFO     Applied an unstaged recipe to the model at local/models/compressed_tensor_equi
➜  sparseml git:(autogptq-compressed-tensors) 

Original Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "/network/rahul/tinyllama_1b_test_w4a16"
/network/rahul/tinyllama_1b_test_w4a16
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files

New Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "local/models/compressed_tensor_equi"
local/models/compressed_tensor_equi
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files
rahul-tuli commented 2 weeks ago

Functionality moved to compressed tensors: https://github.com/neuralmagic/compressed-tensors/pull/82