Open yqzhishen opened 1 year ago
This is not supported for inferencing builds. Only training builds support this.
This is not supported for inferencing builds. Only training builds support this.
I see, thanks. But why not add this API to inferencing builds? I think it's quite useful.
Can you elaborate more on your use case? ORT does allow you to request specific outputs in the API (even if all are computed). If time taken to compute all outputs is a concern, does modifying the model work?
Can you elaborate more on your use case?
I am deploying a singing voice synthesis project to ONNX. The architecture includes:
Lyrics and rhythms are firstly encoded into hidden units, and both of the predictors take these hidden units as input; the phoneme durations can also be given by user in some interactive scenarios. That is to say, the user gets phoneme durations from the model, edits them, and then input them back into the model to get the pitch; or he/she gets both the durations and the pitch in one go.
The problem is: when the user needs the durations, he/she will expect the model to be very fast. The duration predictor itself is lightweight and fast, but the pitch predictor is diffusion-based and will not likely run that fast. If all nodes are forced to be executed in every run, the speed and the waste of computation resources becomes unacceptable.
does modifying the model work?
This works technically, just as I currently do; I split these parts of the model into different ONNX files. However, this may confuse and mislead some of my users.
Assume one of my users have two models in PyTorch, A and B. He exports both A and B to ONNX, and he gets two sets of .onnx files. The two predictors are exported separately, and he may think those models are replaceable. For example, if he thinks pitch predictor A is better than B, but duration predictor B is better than A, he may copy and replace pitch predictor B with pitch predictor A. But this will not work - pitch predictor A can only work with linguistic encoder A - all he gets will be noises.
Most of my users does not know much about deep learning. To avoid this kind of misunderstanding, I have to tell them that these models cannot be mixed up, but I still cannot make sure all my users will see this warning.
Allowing only executing through the required path in ONNX Runtime may help relieve these troubles. It is not a serious problem though.
Describe the issue
The API documentation says "Only execute the nodes needed by fetch list", but it is not actually working.
To reproduce
The logic is quite easy:
model.zip
And I setup RunOptions and InferenceSession like this:
But when I run either
or
it raises errors, indicating that
y2 = x2 + 2
will execute even ifoptions.only_execute_path_to_fetches
is set to True.I wonder if I got something wrong or this is a bug of ORT.
Urgency
No response
Platform
Windows
OS Version
Windows 10 19045.2965
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.14.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response