microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.26k stars 2.87k forks source link

[Performance] Onnx model atrociously slow in runtime #19648

Open tv-gc opened 7 months ago

tv-gc commented 7 months ago

Describe the issue

Hi!

I am trying to run this particular onnx model (default and lite have basically the same impact), https://github.com/TexasInstruments/edgeai-yolox/blob/main/README_6d_pose.md , in runtime because i need to use it with a camera for my current project, but i am getting really bad performance results, between 3 and 8 fps, on both PC and mobile.

I was wondering if i could get some help trying to understand why this specific model is so heavy and what techniques i could use to improve the performance?

Thank you!!

To reproduce

Run the demo in runtime using an api to fetch a webcam frame

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

latest

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Unknown

wangyems commented 7 months ago

To diagnose, you can enable the profiler since your model runs on cpu:

options = onnxruntime.SessionOptions()
options.enable_profiling=True
session = onnxruntime.InferenceSession(
        'model.onnx',
        sess_options=options,
        providers=['CPUExecutionProvider'])
)

Quantization might be able to help here:

  1. 8 bits quantization: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html
  2. 4 bits matmul quantization (if matmul is the bottleneck): https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py
tv-gc commented 7 months ago

Hi @wangyems

Thanks for the fast reply. I have attached the json generated from enabling the profiler, could you by any change take a look and see if something jumps out as being the cause of the bad performance?

In the meantime, i'll take a look at quantization.

Also, i'm running this model (at least on mobile) in Ort Runtime mode, so it should be running on GPU?

Cheers! onnxruntime_profile__2024-02-27_11-02-26.json

petermcaughan commented 7 months ago

From a brief lookover of the profile it looks like FusedConv is one of the more expensive nodes, so I think the 8-bit quantization method @wangyems shared should be effective here!

If you want to run on GPU you can replace providers=['CPUExecutionProvider'] with providers=['CUDAExecutionProvider']. But note that an 8-bit model can't run on a GPU (you'll need to quantize to fp16 instead: https://onnxruntime.ai/docs/performance/model-optimizations/float16.html#float16-conversion)

tv-gc commented 7 months ago

Hi @petermcaughan @wangyems

I tried to run the quantize_static but i don't know where to find the calibration data reader so instead, I run the quantize_dynamic but i am getting this error

Inference failed or unsupported type to quantize for tensor '/1/NonZero_output_0', type is tensor_type {
  elem_type: 7
  shape {
    dim {
      dim_param: "unk__38"
    }
    dim {
      dim_param: "unk__39"
    }
  }
}

and if i run preprocess beforehand i get this error instead

Inference failed or unsupported type to quantize for tensor '/1/NonZero_output_0', type is tensor_type {
  elem_type: 7
  shape {
    dim {
      dim_value: 1
    }
    dim {
      dim_param: "NonZero_445_o0__d1"
    }
  }
}

I would like to ask if it would be possible to help with this matter?

Thank you both in advance!

tv-gc commented 7 months ago

Hi @petermcaughan @wangyems

Sorry to bother but would it be possible to take a look at this problem sometime this week?

Cheers!

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.