Closed dingqingy closed 1 year ago
Hi @dingqingy our torch checkpoints are represented in quantization aware training (QAT) format. In QAT, model weights are still represented as floats, however the quantization steps to INT8 are emulated - this way we can train a model to adjust for INT8 precision loss while using float weights for backpropagation.
Our ONNX models from sparsezoo are fully quantized based on these QAT model graphs and stored in their INT8 representation.
Hi @dingqingy As it looks like there is no further comments on this thread, I am going to go ahead and close this inquiry out. Feel free to re-open if you want to follow up! Thank you! Jeannie / Neural Magic
Hi team,
Thanks for the great repo! I would like to understand how quantization works in sparse-quant models.
The model stub that I tried is: "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none"
The model weight type is still torch.FloatTensor, and I tried a layer find it has almost 5k unique weight value which is way beyond what int8 can represent.
Could you please help how to access quantized (int8) weight for these sparse-quant models?
Thanks again!