microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

RunOptions.only_execute_path_to_fetches not working #16013

Open yqzhishen opened 1 year ago

yqzhishen commented 1 year ago

Describe the issue

The API documentation says "Only execute the nodes needed by fetch list", but it is not actually working.

To reproduce

The logic is quite easy:

image

model.zip

And I setup RunOptions and InferenceSession like this:

options = ort.RunOptions()
options.only_execute_path_to_fetches = True
session = ort.InferenceSession('model.onnx')

But when I run either

session.run(['y1'], input_feed={'x1': np.array(1).astype(np.int64), 'x2': None}, run_options=options)

or

session.run(['y1'], input_feed={'x1': np.array(1).astype(np.int64)}, run_options=options)

it raises errors, indicating that y2 = x2 + 2 will execute even if options.only_execute_path_to_fetches is set to True.

I wonder if I got something wrong or this is a bug of ORT.

Urgency

No response

Platform

Windows

OS Version

Windows 10 19045.2965

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

pranavsharma commented 1 year ago

This is not supported for inferencing builds. Only training builds support this.

yqzhishen commented 1 year ago

This is not supported for inferencing builds. Only training builds support this.

I see, thanks. But why not add this API to inferencing builds? I think it's quite useful.

pranavsharma commented 1 year ago

Can you elaborate more on your use case? ORT does allow you to request specific outputs in the API (even if all are computed). If time taken to compute all outputs is a concern, does modifying the model work?

yqzhishen commented 1 year ago

Can you elaborate more on your use case?

I am deploying a singing voice synthesis project to ONNX. The architecture includes:

Lyrics and rhythms are firstly encoded into hidden units, and both of the predictors take these hidden units as input; the phoneme durations can also be given by user in some interactive scenarios. That is to say, the user gets phoneme durations from the model, edits them, and then input them back into the model to get the pitch; or he/she gets both the durations and the pitch in one go.

The problem is: when the user needs the durations, he/she will expect the model to be very fast. The duration predictor itself is lightweight and fast, but the pitch predictor is diffusion-based and will not likely run that fast. If all nodes are forced to be executed in every run, the speed and the waste of computation resources becomes unacceptable.

does modifying the model work?

This works technically, just as I currently do; I split these parts of the model into different ONNX files. However, this may confuse and mislead some of my users.

Assume one of my users have two models in PyTorch, A and B. He exports both A and B to ONNX, and he gets two sets of .onnx files. The two predictors are exported separately, and he may think those models are replaceable. For example, if he thinks pitch predictor A is better than B, but duration predictor B is better than A, he may copy and replace pitch predictor B with pitch predictor A. But this will not work - pitch predictor A can only work with linguistic encoder A - all he gets will be noises.

Most of my users does not know much about deep learning. To avoid this kind of misunderstanding, I have to tell them that these models cannot be mixed up, but I still cannot make sure all my users will see this warning.

Allowing only executing through the required path in ONNX Runtime may help relieve these troubles. It is not a serious problem though.