microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.65k stars 2.93k forks source link

[Performance] InferenceSession.run executes nodes which are unnecessary for requested outputs #16483

Open cbourjau opened 1 year ago

cbourjau commented 1 year ago

Describe the issue

I expected (and maybe that is the bug?) that the output_names argument of onnxruntime.InferenceSession.run is used to avoid unnecessary computations. However, it appears that the entire graph is executed even if only a subset of outputs is requested.

Avoiding potentially expensive and unnecessary computations would be very nice.

To reproduce

The following code crashes (deliberately) in the Cast operation even though it is unnecessary for the computation of the requested "out_c" output. I attached the resulting ONNX graph as a file. I'm happy to type out a test with pure onnx functions rather than using Spox if required.

from spox import argument, build, Tensor
import numpy as np
import onnxruntime as ort
import spox.opset.ai.onnx.v18 as op

def test_avoid_unecessary_compute():
    a = argument(Tensor(float, ("N",)))
    b = argument(Tensor(str, ("M",)))

    c = op.add(a, a)
    d = op.cast(b, to=np.int64)

    model_proto = build(inputs={"a": a, "b": b}, outputs={"out_c": c, "out_d": d})

    session = ort.InferenceSession(model_proto.SerializeToString())

    session.run(["out_c"], {"a": [1.0], "b": ["foo"]}). # <= Crashes here

The raised error (unsurprisingly) reads:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Cast node. Name:'Cast_0' Status Message: stoll: no conversion

Urgency

We would like to use this feature in a future project where we would generate a large graph with different expensive-to-compute outputs. We would like to avoid building multiple models which only slightly differ in their set of outputs.

Platform

Mac

OS Version

MacOS 12.3.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

test.onnx.zip

Is this a quantized model?

No

pranavsharma commented 1 year ago

Yes, currently ORT executes the entire graph irrespective of what outputs you've requested. Most of the time if a graph has a certain output it must be necessary for the user. You can edit the graph to remove the unnecessary output. Do you've a concrete use case? Similar request here: https://github.com/microsoft/onnxruntime/issues/16013