Open tv-gc opened 7 months ago
To diagnose, you can enable the profiler since your model runs on cpu:
options = onnxruntime.SessionOptions()
options.enable_profiling=True
session = onnxruntime.InferenceSession(
'model.onnx',
sess_options=options,
providers=['CPUExecutionProvider'])
)
Quantization might be able to help here:
Hi @wangyems
Thanks for the fast reply. I have attached the json generated from enabling the profiler, could you by any change take a look and see if something jumps out as being the cause of the bad performance?
In the meantime, i'll take a look at quantization.
Also, i'm running this model (at least on mobile) in Ort Runtime mode, so it should be running on GPU?
From a brief lookover of the profile it looks like FusedConv is one of the more expensive nodes, so I think the 8-bit quantization method @wangyems shared should be effective here!
If you want to run on GPU you can replace providers=['CPUExecutionProvider']
with providers=['CUDAExecutionProvider']
. But note that an 8-bit model can't run on a GPU (you'll need to quantize to fp16 instead: https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion)
Hi @petermcaughan @wangyems
I tried to run the quantize_static but i don't know where to find the calibration data reader so instead, I run the quantize_dynamic but i am getting this error
Inference failed or unsupported type to quantize for tensor '/1/NonZero_output_0', type is tensor_type {
elem_type: 7
shape {
dim {
dim_param: "unk__38"
}
dim {
dim_param: "unk__39"
}
}
}
and if i run preprocess beforehand i get this error instead
Inference failed or unsupported type to quantize for tensor '/1/NonZero_output_0', type is tensor_type {
elem_type: 7
shape {
dim {
dim_value: 1
}
dim {
dim_param: "NonZero_445_o0__d1"
}
}
}
I would like to ask if it would be possible to help with this matter?
Thank you both in advance!
Hi @petermcaughan @wangyems
Sorry to bother but would it be possible to take a look at this problem sometime this week?
Cheers!
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
Hi!
I am trying to run this particular onnx model (default and lite have basically the same impact), https://github.com/TexasInstruments/edgeai-yolox/blob/main/README_6d_pose.md , in runtime because i need to use it with a camera for my current project, but i am getting really bad performance results, between 3 and 8 fps, on both PC and mobile.
I was wondering if i could get some help trying to understand why this specific model is so heavy and what techniques i could use to improve the performance?
Thank you!!
To reproduce
Run the demo in runtime using an api to fetch a webcam frame
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
latest
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Unknown