Open cbourjau opened 1 year ago
Yes, currently ORT executes the entire graph irrespective of what outputs you've requested. Most of the time if a graph has a certain output it must be necessary for the user. You can edit the graph to remove the unnecessary output. Do you've a concrete use case? Similar request here: https://github.com/microsoft/onnxruntime/issues/16013
Describe the issue
I expected (and maybe that is the bug?) that the
output_names
argument ofonnxruntime.InferenceSession.run
is used to avoid unnecessary computations. However, it appears that the entire graph is executed even if only a subset of outputs is requested.Avoiding potentially expensive and unnecessary computations would be very nice.
To reproduce
The following code crashes (deliberately) in the
Cast
operation even though it is unnecessary for the computation of the requested "out_c" output. I attached the resulting ONNX graph as a file. I'm happy to type out a test with pureonnx
functions rather than using Spox if required.The raised error (unsurprisingly) reads:
Urgency
We would like to use this feature in a future project where we would generate a large graph with different expensive-to-compute outputs. We would like to avoid building multiple models which only slightly differ in their set of outputs.
Platform
Mac
OS Version
MacOS 12.3.1
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
test.onnx.zip
Is this a quantized model?
No