Closed igor-yusupov closed 9 months ago
The code that exports ONNX models is basic use of torch.export.onnx
(example). The PyTorch checkpoints are larger because they include additional information, in particular the parameters of the optimizer. The models in this repo use the Adam optimizer, which records two parameters for every model parameter. Since model parameters make up most of the checkpoint data, you'd expect the PyTorch checkpoints to be about 3x the size of the ONNX model, and indeed that is the case.
I got it, thank you! Is it currently possible to run quantized models with rten?
No, not yet. I do plan to support int8 quantization eventually, but my immediate priority is to improve accuracy of the Ocrs models.
Maybe I can help you and try to do that? How hard is it to add this? I understand there is need to add operators that work with int data?
It will be quite a big project. Probably best to keep that discussion in one place in https://github.com/robertknight/rten/issues/42.
I see that size of onnx model is smaller than pytorch model. Do you use any techniques to downsize? Where can I see how you convert models?