hshatti / TONNXRuntime

TOnnxRuntime is a Microsoft ONNXRuntime AI and Machine Learning Library for Freepascal / Delphi
MIT License
48 stars 10 forks source link

CPU Memory Leak during onnxruntime inference #3

Closed nazarovsky closed 1 year ago

nazarovsky commented 1 year ago

Hello, Mr. Shatti, i've used your library to implement yolov7 detector on top of libVlc video player. Everything seems working fine, thank you for the library.

One issue that I have noticed is the CPU memory leak occurring during the multiple runs of inference

    inputs.AddOrSetValue('images',input.ToValue);
    Outputs:=session.Run(Inputs);
    outTensor := outputs['output'];

I've tried some flags like

 DefaultSessionOptions.DisableCpuMemArena; 
 DefaultSessionOptions.DisableMemPattern; 

but they didn't help. Maybe could you suggest any way to free memory while the session is open?

hshatti commented 1 year ago

Hello, Thanks for spotting this, usually the function :

  GetApi().ReleaseValue(Value); // Value is of type TORTValue (or OrtValue in onnxruntime_pas_api.pas unit)

frees the value but you should not do this because in normal cases this function will be invoked when any variable of type TORTValue goes out of scope and it will automatically release the corresponding memory.

I will try to investigate it, I haven't tried the library with a running movie libraries libvlc, ffmpeg etc.. (this may require a GPU provider) but I will need to replicate your environment based on your description and this might take a little time , among the models with a single picture that I have tried so far no memory leaks were detect, if possible, it would be more helpful if you can share a short example project demonstrating this behaviour so I can debug and fix, let me know. Cheers H

nazarovsky commented 1 year ago

Hello, Mr. Shatti,

I've made a minimal sample project (Delphi 11.2 FMX Win64) for investigation. It is Yolov7 detector trained on MS COCO, input is fixed at 640x640x3 ( .onnx file incuded) Then, after the inference NMS is done programmatically to get classes, detections and scores and draw results on the canvas.

It uses CPU provider and uses one single picture for inference (binaries for onnxruntime 1.14.1 included)

You can download all folder here (zip file 387MB)

Source is located in TONNXRuntime\source\

Built Exe file TONNXRuntime\source\Win64\Debug\Project1.exe

The CPU memory leak I'm having on my machine is around 8-9 MB per one inference.

BTW, no modifications to TONNXRuntime library were done, I've just made a clean git clone on this repository and then added demo code.

2023-04-24_011925

Best Regards, A

hshatti commented 1 year ago

Hi Alex, Thank you for spotting this issue , I have just pushed a fix for it ( onnxruntime.pas ), please pull the updates to your project, one last thing I have noticed in your project another possible memory leak that can happen but not related to the library here is my suggestion to fix it :

in the TDetBoxList destructor override "destroy" i have changed FreeMem(Items[i]) to Dispose(Items[i])

then made the following changes to the function :

function InferenceFromBitmap(bmp_in:TBitmap; enable_nms:Boolean):Integer;
var 
...
    DB: PDetBox;
    DetBox: TDetBox;
...
begin
 ...
  - GetMem(DetBox,SizeOf(TDetBox)); // remove this line
 ...
  if (max_score)> score_threshold then
    begin
      // correct boxes cooordinates if they are outside the input image dimensions
      // correct x1 x2
      + New(DB);  // add this line
      if DetBox.bbox[0] <0 then
         DetBox.bbox[0]:=0;
      if DetBox.bbox[0] >=IN_W-1 then
         DetBox.bbox[0]:=IN_W-1;
      if DetBox.bbox[2] <0 then
         DetBox.bbox[2]:=0;
      if DetBox.bbox[2] >IN_W-1 then
         DetBox.bbox[2]:=IN_W-1;
      // correct y1 y2
      if DetBox.bbox[1] <0 then
         DetBox.bbox[1]:=0;
      if DetBox.bbox[1] >=IN_H-1 then
         DetBox.bbox[1]:=IN_H-1;
      if DetBox.bbox[3] <0 then
         DetBox.bbox[3]:=0;
      if DetBox.bbox[3] >IN_H-1 then
         DetBox.bbox[3]:=IN_H-1;
      DetBox.area:= (DetBox.bbox[3]-DetBox.bbox[1]+1)*
                    (DetBox.bbox[2]-DetBox.bbox[0]+1);
      DetBox.conf:= max_score;
      DetBox.cls := argmax_class;
     + Move(DetBox, DB^, SizeOf(TDetBox)); // add this line
     + ThresholdedDetBoxList.Add(DB);  // change DetBox to DB
    end;
 ...
end;

Now there should be no more memory leaks and everything should come out clean! Let me know how does it work and if there is any other spotted bugs please feel free to create a new issue. I'm gonna add your example to the Examples folder if it's fine by you.

Cheers H

nazarovsky commented 1 year ago

Thank you, Mr. Shatti. I'll try the fix.

It's fine to use the object detector code as an example. Sometime later I will also make an example on semantic segmentation. Let's keep in touch.

Best Regards, A

nazarovsky commented 1 year ago

Ok, it worked like a charm, thanks for sorting it out! I think you can close the issue.

Best Regards, A