microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.65k stars 2.93k forks source link

System memory leak on cuda GPU backend. #8147

Open nttstar opened 3 years ago

nttstar commented 3 years ago

Describe the bug System memory keeps increasing while using the CUDA GPU backend.

Urgency very urgent

System information

To Reproduce

Please download the detection model from https://1drv.ms/u/s!AswpsDO2toNKsTYUYsyy9kdSZSfe?e=KPHWCL (onedrive link) And then use the following code to test:

import numpy as np                                                                                                                                            
import onnxruntime                                                                                                                                            
import cv2                                                                                                                                                    

model_file = 'scrfd_10g_bnkps.onnx'                                                                                                                                    
session = onnxruntime.InferenceSession(model_file, None)                                                                                                      
input_cfg = session.get_inputs()[0]                                                                                                                           
input_shape = input_cfg.shape                                                                                                                                 
input_name = input_cfg.name                                                                                                                                   
outputs = session.get_outputs()                                                                                                                               
output_names = []                                                                                                                                             
for o in outputs:                                                                                                                                             
    output_names.append(o.name)                                                                                                                               
img = np.random.randint(0, 255, size=(640,640,3), dtype=np.uint8)                                                                                             
input_std = 128.0                                                                                                                                             
input_mean = 127.5                                                                                                                                            
blob = cv2.dnn.blobFromImage(img, 1.0/input_std, (640, 640), (input_mean, input_mean, input_mean), swapRB=True)                                               
for _ in range(1000000):                                                                                                                                      
    net_outs = session.run(output_names, {input_name : blob})                                                                                                 
    pred = net_outs[0]

The leak is happening at pred = net_outs[0]. If we omit this line, there's no memory leak. Also,

  1. If we use CPU backend by setting session.set_providers(['CPUExecutionProvider']), no memory leak.
  2. If we use cuda10.2 and onnxruntime-gpu==1.6, no memory leak.

Expected behavior system memory cost is stable

Screenshots

Additional context

yuslepukhin commented 3 years ago

The code and the leak location does not indicate ORT involvement here. Perhaps, you could supply more info along with the model if possible.

nttstar commented 3 years ago

The code and the leak location does not indicate ORT involvement here. Perhaps, you could supply more info along with the model if possible.

@yuslepukhin Thanks for your attention, I have updated the issue description.

yuslepukhin commented 3 years ago

Can you insert gc.collect() right after run and see if anything changes?

nttstar commented 3 years ago

Can you insert gc.collect() right after run and see if anything changes?

@yuslepukhin Nothing changed.

yuslepukhin commented 3 years ago

And you are feeding a numpy array returned from cv2.dnn.blobFromImage()?

nttstar commented 3 years ago

And you are feeding a numpy array returned from cv2.dnn.blobFromImage()?

@yuslepukhin Yes. It's not the key problem, I believe. Ver 1.6 works well for this piece of code.

yuslepukhin commented 3 years ago

There is a repro.

mrjarhead commented 3 years ago

FYI - Ran into this same memory issue when running CUDA inference for a set of deep learning models (RetinaFace+ArcFace+Age Estimation+Custom Classifier+YoloV4) with ORT 1.8.1 (C# API). The memory leak takes a number of runs before it really starts to rear it's head. Originally went down the rabbit hole of tracing any memory leaks from items not being disposed, but it turns out it was the ORT+CUDA version combination (issue does not occur when running the CPU EP). Upgraded CUDA to 11.4.1 and the latest cuDNN version, and life appears to be good now.

TLDR; ONNX Runtime 1.8.1 CPU EP == No Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.1+cuDNN 8.0.4.30 == Memory Leak ONNX Runtime 1.8.1 CUDA EP - CUDA 11.4.1+cuDNN 8.2.2.26 == No Memory Leak

I have not tested extensively on any other CUDA or ORT builds. Hope this helps and saves someone time in the future!

cqray1990 commented 3 years ago

@yuslepukhin how to insert gc.collect() , i can't find the gc,@nttstar

cqray1990 commented 3 years ago

@ mrjarhead did test the ONNX Runtime 1.8.1 CUDA EP - CUDA 11.4.1+cuDNN 8.2.2.26 enviroment , are you sure it is effective?i try cuda 11.0 onnxruntime 1.7 it also Memeort leak

cqray1990 commented 3 years ago

@mrjarhead

mrjarhead commented 3 years ago

@cqray1990 - 100% sure. I've updated to the latest CUDA/cuDNN and ONNX Runtime versions since my original post, and still no memory leaks. I operate high-volume inference systems that process video content, so any memory leaks can knock my stuff offline in a hurry.

Initially, I thought the issue was tied to image resources not being properly disposed after use, which is a common source of memory leakage (GC.Collect() won't help you here). If you're still seeing leaks after updating CUDA/cuDNN/ORT to the latest versions, then you might want to do a review of your code to ensure you're disposing resources properly.

One additional note - I am running on Windows based OSes, i.e. Win10 and Server 2019.

CarlPoirier commented 2 years ago

I too can confirm this. I'm using a detection model with some ResNet backbone. My environment is Windows Server 2016. Here are the CUDA versions used:

ONNX Runtime 1.9.0 CUDA EP - CUDA 11.1.1+cuDNN 8.0.5.39 == Memory Leak ONNX Runtime 1.9.0 CUDA EP - CUDA 11.4.3+cuDNN 8.2.4.15 == No Memory Leak

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

LoveU3tHousand2 commented 1 year ago

onnxruntime 1.14.1 cuda 11.6 cudnn 8.9.0 Memory Leak. very bad.