microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.85k stars 2.94k forks source link

[Performance] Too Slow when i do inference #13265

Open dmjeong opened 2 years ago

dmjeong commented 2 years ago

Describe the issue

OS : windows 10 Lang : C# SDK : Visual Studio 2019 ONNX Version : 1.12.1 Gpu GPU : RTX 3080 CUDA : 11.4

befor i process this code below, i did initialization _gSession.

but it is very slow when i do _gSession.Run Second Time.

in c++ env, it is more faster than c# env.

what am i wrong?

To reproduce

here is my code below

        OpenCvSharp.Mat[] inputsrc = new OpenCvSharp.Mat[5];
        var container = new List<NamedOnnxValue>();
        DenseTensor<float> inputTensor, outputTensor;
        var bufferoutput = new float[15];
        _gDimenssion[0] = 5;
        float[] inputdata = new float[_gSize.Width * _gSize.Height * 5];
        for (int i = 0; i < 5; i++)
        {
            inputsrc[i] = new OpenCvSharp.Mat();
            inputsrc[i] = Cv2.ImRead("E:\\eff\\" + (i + 1).ToString() + ".png", 0);
            OpenCvSharp.Mat resizedImagefloat = new OpenCvSharp.Mat();
            resizedImagefloat = MakeMat(inputsrc[i], _gSize);
            Array.Copy(MatToList(resizedImagefloat), 0, inputdata, i * 224 * 224, 224 * 224);
        }
        inputTensor = new DenseTensor<float>(inputdata, _gDimenssion);
        outputTensor = new DenseTensor<float>(bufferoutput, new int[] { 15 });
        FixedBufferOnnxValue InputonnxValue = FixedBufferOnnxValue.CreateFromTensor(inputTensor);
        FixedBufferOnnxValue OutputonnxValue = FixedBufferOnnxValue.CreateFromTensor(outputTensor);
        container.Add(NamedOnnxValue.CreateFromTensor<float>(_gInputLayerName, inputTensor));
        //var tensor = new DenseTensor<float>(inputdata, _gDimenssion);

        double expsum = 0;
        double Confidence = 0;
        float maxActivation = -9999;
        int predlabelID = 0;
        var inputnames = new[] { _gInputLayerName };
        var outputnames = new[] { _gOutputLayerName };
        var inputvalues = new[] { InputonnxValue };
        var outputvalues = new[] { OutputonnxValue };
        _gSession.Run(inputnames, inputvalues, outputnames, outputvalues);
        for(int i = 0; i < 5; i++)
        {
            float k = bufferoutput[i];
        }

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

11.4

Model File

No response

Is this a quantized model?

Unknown

yuslepukhin commented 2 years ago

Can this be related to this? https://github.com/microsoft/onnxruntime/issues/10746

Another thing to note, is that a lot of C# onnxruntime classes are IDisposable because they are backed by native resources. So if you repeatedly run things and do not dispose of them, things may get slow.

smk2007 commented 2 years ago

Another thing to try would be to use OpenCV Interop with WinML/C#. Feel free to check out the sample here: https://github.com/microsoft/Windows-Machine-Learning/tree/master/Samples/WinMLSamplesGallery/WinMLSamplesGallery/Samples/OpenCVInterop

dmjeong commented 2 years ago

Can this be related to this? #10746

Another thing to note, is that a lot of C# onnxruntime classes are IDisposable because they are backed by native resources. So if you repeatedly run things and do not dispose of them, things may get slow.

I do not dispose session and other related params. Just Change Input number of image. Ex. Input 1 Image when first inference. And then input 2~32 Images second and after inference. I think if input image number is changed that inferece is slow.. if i keep input same number of image(1) next inference is faster than first.

Idk why