Question on quantization

neuralmagic / sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Apache License 2.0

371 stars 25 forks source link

Question on quantization #349

Closed dingqingy closed 1 year ago

dingqingy commented 1 year ago

Hi team,

Thanks for the great repo! I would like to understand how quantization works in sparse-quant models.

The model stub that I tried is: "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none"

The model weight type is still torch.FloatTensor, and I tried a layer find it has almost 5k unique weight value which is way beyond what int8 can represent.

Could you please help how to access quantized (int8) weight for these sparse-quant models?

Thanks again!

bfineran commented 1 year ago

Hi @dingqingy our torch checkpoints are represented in quantization aware training (QAT) format. In QAT, model weights are still represented as floats, however the quantization steps to INT8 are emulated - this way we can train a model to adjust for INT8 precision loss while using float weights for backpropagation.

Our ONNX models from sparsezoo are fully quantized based on these QAT model graphs and stored in their INT8 representation.

jeanniefinks commented 1 year ago

Hi @dingqingy As it looks like there is no further comments on this thread, I am going to go ahead and close this inquiry out. Feel free to re-open if you want to follow up! Thank you! Jeannie / Neural Magic