neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

Question on quantization size #1429

Closed rajuptvs closed 9 months ago

rajuptvs commented 9 months ago

Hey Team, First of all thank you for all your wonderful work on quantizing models for the community.

I have some questions on quantization using sparseml and sparsezoo. I have been trying to perform Sparse Transfer Learning With a Custom Dataset mainly using yolov8s model as below

!sparseml.ultralytics.train \
  --model "zoo:cv/detection/yolov8-s/pytorch/ultralytics/coco/pruned65-none" \
  --recipe "zoo:cv/detection/yolov8-s/pytorch/ultralytics/voc/pruned65_quant-none" \
  --data /content/datasets/Sphero-Robot-detection-8/data.yaml \
  --recipe_args '{"num_epochs":15, "qat_start_epoch": 10, "observer_freeze_epoch": 12, "bn_freeze_epoch": 12}' \
  --batch 8

My main question is that usual .pt files of yolov8s models before and after training are of the size in range of 20-30 MB, but when using the above recipe, the model that is getting downloaded are of the size in the range of 120 - 130 MB, I was under the impression that models being quantized and pruned, should usually of smaller sizes in the range of 6-8 MB as shown in Zoo.

Am I doing something wrong or is this usual?

Thanking you in advance, Raju

jeanniefinks commented 9 months ago

Greetings @rajuptvs Because this is a duplicate issue of another one you posted in our sparseml repo, I am going to go ahead and close this one out. We have provided a response to that issue here: https://github.com/neuralmagic/sparseml/issues/1854#issuecomment-1832682153

Thank you for your inquiry! Jeannie / Neural Magic