neuralmagic / sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Apache License 2.0
2.07k stars 148 forks source link

Check for 2:4 structure when saving `SparseAutoModel` #2317

Closed dbogunowicz closed 5 months ago

dbogunowicz commented 5 months ago

Feature request: explicitly check whether the SparseAutoModel saved with save_pretrained has 2:4 sparsity structure.

Example:

from sparseml.transformers import SparseAutoModelForCausalLM

model_path = "/network/eldar/models_to_share/llama2_7b_sp24_v1"
model = SparseAutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")
model.save_pretrained("./some_funny_test")
Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:03<00:00,  1.18s/it]
2024-06-05 13:11:10 sparseml.transformers.sparsification.compressed_tensors_utils INFO     Inferring a sparsity configuration requires a global sparsity calculation. This can be costly for large models. To skip the calculation of compression statistics set skip_compression_stats=True
Calculating model sparsity: 100%|████████████████████████| 291/291 [00:12<00:00, 23.31it/s]
Checking whether model follows 2:4 sparsity structure: 100%|█| 225/225 [00:20<00:00, 10.88it/s]

cat some_funny_test/config.json 

...
"compression_config": {
    "sparsity_config": {
      "format": "dense",
      "global_sparsity": 48.05290754567787,
      "registry_requires_subclass": false,
      "sparsity_structure": "2:4"
    }
  },
...