Closed yayapa closed 3 years ago
Hi,
thanks for your question. Most pytorch implementation indeed can only conduct fake quantization and cannot bring memory space reduction in pure pytorch settings. To reduce the model size, you need more advanced tools, such as TensorRT, which is a platform for model deployment.
Regarding your second question, the weights stored in the state_dict()
are full-precision. They will be quantized during forwarding pass.
Thank you for the answer!
Thank you for the great contribution! We are experimenting now with your implementation of QuantConv2d and trying to integrate it for object detection, namely into RetinaNet. Therefore, I would like to ask you some questions to validate my assumptions.
You can directly point out, if I wrote something wrong in my assumptions.
Best regards,