Feature request: explicitly check whether the SparseAutoModel saved with save_pretrained has 2:4 sparsity structure.
Example:
from sparseml.transformers import SparseAutoModelForCausalLM
model_path = "/network/eldar/models_to_share/llama2_7b_sp24_v1"
model = SparseAutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")
model.save_pretrained("./some_funny_test")
Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:03<00:00, 1.18s/it]
2024-06-05 13:11:10 sparseml.transformers.sparsification.compressed_tensors_utils INFO Inferring a sparsity configuration requires a global sparsity calculation. This can be costly for large models. To skip the calculation of compression statistics set skip_compression_stats=True
Calculating model sparsity: 100%|████████████████████████| 291/291 [00:12<00:00, 23.31it/s]
Checking whether model follows 2:4 sparsity structure: 100%|█| 225/225 [00:20<00:00, 10.88it/s]
Feature request: explicitly check whether the
SparseAutoModel
saved withsave_pretrained
has2:4
sparsity structure.Example: