Open IzanCatalan opened 7 months ago
No, ORT does not support this scenario. Each "session" conceptually maps to an entire model, not a portion of the model.
To achieve what you want, you would have to break-up the layers that you are interested in each model into separate models and chain them together like the sample code you pasted.
Hope this helps.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
Hi everyone, I would like to know if performing a layer-per-layer inference on Onnx Runtime with a pre-trained model (in fp32 or int8 datatypes) is possible.
My idea is to use several fp32 and int8-quantized models from ONNX Model Zoo Repo and then do the inference layer by layer to achieve a feature extraction. After this, I would modify the outputs from each layer, and I would use them as a new input for the following layers.
The approximate code would be something similar to this one:
However, I tried to reproduce this code with a resnet50 pre-trained model from ONNX Model Zoo Repo, but it seems this model, like the rest of pre-trained models, only has one input and one output (no way of accessing to intermediate outputs.
So, is there any way I could do this?
Thank you!
To reproduce
I am running onnxruntime build from source for cuda 11.2, GCC 9.5, cmake 3.27 and python 3.8 with ubuntu 20.04.
Urgency
No response
Platform
Linux
OS Version
20.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
onnxruntime-gpu1.12.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
Cuda 11.2