[feature request] [discussion] Baseline ONNX interpreter / executor in python / PyTorch

vadimkantorov commented 4 months ago

🚀 The feature, motivation and pitch

It is sometimes useful for basic perf testing to be able to execute a third-party given ONNX file in different backends. Currently there exist several such executors: ORT, TRT and maybe some others specific for some hardware.

With Inductor / autotuning becoming more powerful, the option of having Python / PyTorch at server-side inference side is becoming more relevant, especially because PyTorch evolves very fast and is more flexible.

It is quite easy to benchmark all these options when the original model is provided as PyTorch code. But when the model is provided as some third-party exported ONNX file (or if the source code is not available anymore or was produced by some legacy code which is hard to run), it is not as easy. And having some basic ONNX interpreter using PyTorch as a backend is quite useful (currently there exist trt/polygraphy project for benchmarking the same ONNX model with TRT vs ORT, so adding in there some baseline of torch.compile/Inductor-based ONNX executor would be nice). If I understand correctly, Caffe2 at some point was also such an ONNX interpreter.

Some different ways such an ONNX backend can be (and all of them are useful in some context): 1) eager interpreter, traversing and executing an ONNX graph 2) graph transformation from ONNX to FX graph 3) from a ONNX graph - generation of Python code using PyTorch

Option (3) would be easiest to debug, but also a code printout could possibly be generated from an FX graph?

A clear burden would be staying in sync with ONNX spec/opset, but as ONNX is more restrictive it probably should be bearable?

Original question: https://discuss.pytorch.org/t/onnx-interpreter-using-pytorch-as-a-backend/203825

Alternatives

No response

Additional context

No response

justinchuby commented 2 months ago

There exists projects like https://github.com/ENOT-AutoDL/onnx2torch (I haven’t tried)

There is also the possibility of implementing the ONNX reference runtime with the Array API, with which PyTorch can be enabled as a backend: https://github.com/onnx/onnx/issues/6289

vadimkantorov commented 2 months ago

There exists projects like https://github.com/ENOT-AutoDL/onnx2torch (I haven’t tried)

It seems that it's doing ONNX graph -> FX graph conversion which is a nice baseline to have. I would suggest that such a basic converter should be more official: in pytorch repo preferably - for having a simple baseline of running third-party/old ONNX models with Inductor

I also wonder if ONNX graph -> Python code (or ONNX graph -> FX graph -> Python code) could also be done nicely, because Python code using PyTorch ops is a nice way of exploring a third-party graph/model and then still should torch.compile nicely, or could be then be used/integrated in some general PyTorch-using codebase

justinchuby commented 1 month ago

I am starting to think this is a good idea - thank you. We can use this to aid model export as well, when users can write ONNX ops directly as pytorch ops. @vadimkantorov are you interested in contributing parts of this?

pytorch / pytorch