microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.22k stars 2.87k forks source link

[Performance] CPU Usage is too high #14490

Open kimdodo97 opened 1 year ago

kimdodo97 commented 1 year ago

Describe the issue

I am using my YOLO model learned with pytorch by converting it to onnx I am inferring using onnxruntime-gpu in C#. GPU is used but CPU usage is too high Is there any way to lower the CPU usage?

Model name: YOLOv5s Model opset: 12

To reproduce

private async Task<List<DenseTensor<float>>> Inference(List<NamedOnnxValue> inputs)
  {
      Bitmap resized = null;

      if (image.Width != _model.Width || image.Height != _model.Height)
      {
          resized = ResizeImage(image); // fit image size to specified input size
      }

      var inputs = new List<NamedOnnxValue> // add image as onnx input
      {
          NamedOnnxValue.CreateFromTensor("images", ExtractPixels(resized ?? image))
      };

      IDisposableReadOnlyCollection<DisposableNamedOnnxValue> result = await Task.Run(()=>_inferenceSession.Run(inputs)); // run inference

      var output = new List<DenseTensor<float>>();

      foreach (var item in _model.Outputs) // add outputs for processing
      {
          output.Add(result.First(x => x.Name == item).Value as DenseTensor<float>);
      };

      return output;
  }

  /// <summary>
  /// Parses net output (detect) to predictions.
  /// </summary>
  private List<YoloPrediction> ParseDetect(DenseTensor<float> output, Image image)
  {
      var result = new ConcurrentBag<YoloPrediction>();

      var (w, h) = (image.Width, image.Height); // image w and h
      var (xGain, yGain) = (_model.Width / (float)w, _model.Height / (float)h); // x, y gains
      var gain = Math.Min(xGain, yGain); // gain = resized / original

      var (xPad, yPad) = ((_model.Width - w * gain) / 2, (_model.Height - h * gain) / 2); // left, right pads

      Parallel.For(0, (int)output.Length / _model.Dimensions, (i) =>
      {
          if (output[0, i, 4] <= _model.Confidence) return; // skip low obj_conf results

          Parallel.For(5, _model.Dimensions, (j) =>
          {
              output[0, i, j] = output[0, i, j] * output[0, i, 4]; // mul_conf = obj_conf * cls_conf
          });

          Parallel.For(5, _model.Dimensions, (k) =>
          {
              if (output[0, i, k] <= _model.MulConfidence) return; // skip low mul_conf results

              float xMin = ((output[0, i, 0] - output[0, i, 2] / 2) - xPad) / gain; // unpad bbox tlx to original
              float yMin = ((output[0, i, 1] - output[0, i, 3] / 2) - yPad) / gain; // unpad bbox tly to original
              float xMax = ((output[0, i, 0] + output[0, i, 2] / 2) - xPad) / gain; // unpad bbox brx to original
              float yMax = ((output[0, i, 1] + output[0, i, 3] / 2) - yPad) / gain; // unpad bbox bry to original

              xMin = Clamp(xMin, 0, w - 0); // clip bbox tlx to boundaries
              yMin = Clamp(yMin, 0, h - 0); // clip bbox tly to boundaries
              xMax = Clamp(xMax, 0, w - 1); // clip bbox brx to boundaries
              yMax = Clamp(yMax, 0, h - 1); // clip bbox bry to boundaries

              YoloLabel label = _model.Labels[k - 5];

              var prediction = new YoloPrediction(label, output[0, i, k])
              {
                  Rectangle = new RectangleF(xMin, yMin, xMax - xMin, yMax - yMin)
              };

              result.Add(prediction);
          });
      });

      return result.ToList();
  }

Urgency

No response

Platform

Windows

OS Version

window10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA11.7

Model File

No response

Is this a quantized model?

No

RyanUnderhill commented 1 year ago

If possible, can you quantify what 'CPU usage is too high' is exactly? Like what % CPU usage, and what % GPU usage you get when running the model (and what you'd expect).

kimdodo97 commented 1 year ago

% CPU usage is very high In the Python model, it has a share of about 20%. Running it in C# it goes up to 50% How to solve?

RyanUnderhill commented 1 year ago

How does the time to run the model compare vs Python? If it is faster, then the higher CPU usage could be a result of the GPU completing work faster and keeps the CPU busier.

See here for more information: https://onnxruntime.ai/docs/performance/tune-performance.html#why-is-my-model-running-slower-on-gpu-than-on-cpu

kimdodo97 commented 1 year ago

Currently, I am detecting real-time video through RSTP, and when I start video inference, it shows higher % CPU Usage compared to Python. When I output the inferred result in real time, the frame seems to drop

roushrsh commented 1 year ago

I'm having the same issue. The onnx model is the same as the original TF model on python, but significantly slower when running on C# (order of magnitude), also see the spike in CPU usage.