[feature] Support model compression

kleveross / klever-model-registry

Cloud Native Machine Learning Model Registry

https://kleveross.github.io/klever-model-registry/api/

Apache License 2.0

80 stars 25 forks source link

[feature] Support model compression #48

Open gaocegege opened 4 years ago

gaocegege commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

xieydd commented 4 years ago

Post-training quantization for model compression?

gaocegege commented 4 years ago

Yeah, based on TRT

xieydd commented 4 years ago

So this feature is only for triton server, support int8 trt model? Not consider pytorch or tensorflow post-training quantization? Or use TRT KLD use some data calibration for all model from different framework.

gaocegege commented 4 years ago

The latter, I think.

use TRT KLD use some data calibration for all model from different framework

In the future we will investigate if we can support TVM or other frameworks.

xieydd commented 4 years ago

Thanks for your response. I po something about TVM deploy quantization model in below: TVM deploy model on CUDA TVM deploy TFLite Quantization model TVM deploy Pytorch Quantization model