microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.59k stars 2.92k forks source link

Inference Layer by Layer or feature extraction on Onnx Runtime #19954

Open IzanCatalan opened 7 months ago

IzanCatalan commented 7 months ago

Describe the issue

Hi everyone, I would like to know if performing a layer-per-layer inference on Onnx Runtime with a pre-trained model (in fp32 or int8 datatypes) is possible.

My idea is to use several fp32 and int8-quantized models from ONNX Model Zoo Repo and then do the inference layer by layer to achieve a feature extraction. After this, I would modify the outputs from each layer, and I would use them as a new input for the following layers.

The approximate code would be something similar to this one:

model_path = "model.onnx"
ort_session = ort.InferenceSession(model_path)

input_data = np.random.randn(1, 3, 32, 32).astype(np.float32)

conv1_output = ort_session.run(None, {'input1': input_data})[0]

conv2_output = ort_session.run(None, {'input2': conv1_output})[0]

# Now, I can work with intermediate outputs, modify them and use them as new inputs

However, I tried to reproduce this code with a resnet50 pre-trained model from ONNX Model Zoo Repo, but it seems this model, like the rest of pre-trained models, only has one input and one output (no way of accessing to intermediate outputs.

So, is there any way I could do this?

Thank you!

To reproduce

I am running onnxruntime build from source for cuda 11.2, GCC 9.5, cmake 3.27 and python 3.8 with ubuntu 20.04.

Urgency

No response

Platform

Linux

OS Version

20.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-gpu1.12.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

Cuda 11.2

hariharans29 commented 7 months ago

No, ORT does not support this scenario. Each "session" conceptually maps to an entire model, not a portion of the model.

To achieve what you want, you would have to break-up the layers that you are interested in each model into separate models and chain them together like the sample code you pasted.

Hope this helps.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.