pytorch / ao

PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
715 stars 92 forks source link

How does this work with ONNX export and quantization? #777

Open ogencoglu opened 2 weeks ago

ogencoglu commented 2 weeks ago

Does quantized models here become quantized models in ONNX after conversion? Can you even convert/export them to ONNX? How about other way around? Can you export a sparse model to ONNX and quantize in ONNX afterwards?

msaroufim commented 2 weeks ago

We haven't really experimented much with ONNX so far. Though we do support export and once you export a model you can use an ONNX backend

  1. Step1: Export an AO model https://github.com/pytorch/ao/tree/main/torchao/quantization#workaround-with-unwrap_tensor_subclass-for-export-aoti-and-torchcompile-pytorch-24-and-before-only
  2. Step2: Use the ONNX backend https://pytorch.org/tutorials/beginner/onnx/export_simple_model_to_onnx_tutorial.html

If you wanna work through an example and post your progress here, happy to unblock you! We can add some example in the repo