quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.13k stars 382 forks source link

How to export quantized model to onnx for onnxruntime? #1756

Open ai4prod opened 1 year ago

ai4prod commented 1 year ago

Hi,

thanks for Aimet. I would like to know if I can export a quantize model into onnx and then into onnxruntime. I would like to know if i need to export some of custom op inside onnxruntime to get onnx model working?

If yes there is some documentation on how to do?

Thanks

quic-mangal commented 1 year ago

@ai4prod currently we do not export the custom quantization ops to ONNX.

I am curious to know your use case though. Because we are working on supporting quantization for ONNX models. The code is available here and we will release the pip package as well eventually. If you would like to try it though, you could compile the repo yourself and add aimet_onnx to your PYTHONPATH and try it out.

FelixSchwarz commented 1 year ago

@quic-mangal Not sure what the use case for @ai4prod is but I was looking into that option today as well.

Basically my idea would be that we can run inference using a onnx model without having to install the full aimet software.

Installing "the real thing" is painful for us due to the outdated/very specific dependencies (see #2484). Our codebase has lots of code to handle our custom datasets and evaluate model performance. Make that compatible with AIMET's dependencies would be a major hassle for us. For example we are using Python 3.10, pytorch 2, onnx 1.13+, onnxruntime 1.14+ etc.

If we could just register "libaimet_onnxrt_ops.so" with onnxruntime to execute the onnx with aimet's custom ops that would be a major benefit. We would "just" need to run the actual quantization in a docker container but could do the inferencing/model evaluation in our own environment (assuming we have a working "onnxrt").

quic-mangal commented 1 year ago

@FelixSchwarz, could you clarify if some input is needed from our side?

Our C++ code for the AIMET custom ops is available on the repo, you could generate a .so out of it. Also, while quantizing make sure to place the quantization op at the correct place in the graph.

FelixSchwarz commented 1 year ago

@FelixSchwarz, could you clarify if some input is needed from our side?

I created #2493 because it might be a bit different from what ai4prod wanted.