2024-05-15 12:15:13 sparseml.transformers.finetune.runner INFO *** One Shot ***
2024-05-15 12:15:14 sparseml.core.recipe.recipe INFO Loading recipe from file tests/sparseml/transformers/compression/recipes/new_quant_full.yaml
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
2024-05-15 12:15:14 sparseml.modifiers.quantization_vllm.pytorch INFO Running vLLMQuantizationModifier calibration with 4 samples...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.19it/s]
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.0.block_sparse_moe.experts.1.w1 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.0.block_sparse_moe.experts.1.w2 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
...
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.10.block_sparse_moe.experts.3.w3 received less than 30% of calibration batch tokens (233/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w1 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w2 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING The module_name: model.layers.11.block_sparse_moe.experts.2.w3 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
Note: this branch requires this PR: https://github.com/neuralmagic/compressed-tensors/pull/46 to land in
compressed-tensors
.Example Use: