This pull request introduces the following enhancements:
BaseConverter for Transforming Model Checkpoints:
A new BaseConverter class has been added to facilitate the transformation of model checkpoints.
ExllamaToCompressedTensorConverter:
This new converter transforms an AutoGPTQ Exllama checkpoint into the CompressedTensors format, making it loadable in SparseAutoModel classes.
Test Code
Below is an example of how to use the ExllamaToCompressedTensorConverter:
Test Code:
from sparseml.utils.pytorch import ExllamaToCompressedTensorConverter
def local_test():
autogptq_model_path: str = "/network/rahul/tinyllama_1b_test_w4a16"
new_path = ExllamaToCompressedTensorConverter.convert_from_safetensors(
autogptq_model_path, save_dir="local/models/compressed_tensor_equi"
)
from sparseml.transformers import SparseAutoModelForCausalLM
model = SparseAutoModelForCausalLM.from_pretrained(new_path)
local_test()
Output:
python local/investigation.py
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_name" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "RegistryMixin";
warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "SparsityCompressionConfig";
warnings.warn(
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO Loading file: /network/rahul/tinyllama_1b_test_w4a16/model.safetensors
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO Applying transformations...
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.transformations INFO Applying transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO Transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS complete
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO Applying transformation: TRANSFORM_EXLLAMA_NAMES
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO Transformation: TRANSFORM_EXLLAMA_NAMES complete
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/quantize_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/special_tokens_map.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.model to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Copying file: /network/rahul/tinyllama_1b_test_w4a16/recipe.yaml to local/models/compressed_tensor_equi
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
warnings.warn(
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO Updating quantization config...
2024-06-07 14:34:50 sparseml.transformers.sparsification.sparse_model WARNING The dtype of the loaded model: torch.float32 is different from from the dtype specified in the model config: torch.float16.To load the model in the format that it was previously saved in, set torch_dtype=`auto` in the SparseAutoModel creation call.
2024-06-07 14:34:50 sparseml.transformers.utils.helpers INFO Found recipe in the model_path: local/models/compressed_tensor_equi/recipe.yaml
Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.logger.logger INFO Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.recipe.recipe INFO Loading recipe from file local/models/compressed_tensor_equi/recipe.yaml
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base WARNING GPTQ quantization is set to True without an active quantization modifier.
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base INFO Building quantization modifier with args: {'config_groups': {'group_0': QuantizationScheme(targets=['Linear'], weights=QuantizationArgs(num_bits=4, type=<QuantizationType.INT: 'int'>, symmetric=True, group_size=128, strategy=<QuantizationStrategy.GROUP: 'group'>, block_structure=None, dynamic=False, observer='minmax', observer_kwargs={}), input_activations=None, output_activations=None)}, 'ignore': ['lm_head', 'Embedding']}
manager stage: Model structure initialized
2024-06-07 14:34:50 sparseml.pytorch.model_load.helpers INFO Applied an unstaged recipe to the model at local/models/compressed_tensor_equi
➜ sparseml git:(autogptq-compressed-tensors)
PR Description
This pull request introduces the following enhancements:
BaseConverter
for Transforming Model Checkpoints:BaseConverter
class has been added to facilitate the transformation of model checkpoints.ExllamaToCompressedTensorConverter
:SparseAutoModel
classes.Test Code
Below is an example of how to use the
ExllamaToCompressedTensorConverter
:Test Code:
Output:
Original Checkpoint:
New Checkpoint: