Open gaocegege opened 4 years ago
Post-training quantization for model compression?
Yeah, based on TRT
So this feature is only for triton server, support int8 trt model? Not consider pytorch or tensorflow post-training quantization? Or use TRT KLD use some data calibration for all model from different framework.
The latter, I think.
use TRT KLD use some data calibration for all model from different framework
In the future we will investigate if we can support TVM or other frameworks.
Thanks for your response. I po something about TVM deploy quantization model in below: TVM deploy model on CUDA TVM deploy TFLite Quantization model TVM deploy Pytorch Quantization model
Is this a BUG REPORT or FEATURE REQUEST?:
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: