zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

Inference GPU Memory Used is so strange #537

Open wildbrother opened 3 years ago

wildbrother commented 3 years ago

I used efficientdet_test.py

I used d3 model

when Excuting the model, first the GPU memory use up to 7~8GB, and it is going down to 1.25GB (when the code is on the for inference loop)

I think the VRAM is going up to 7~8GB when the computer put up the model structure and weights. But I really don't understand why it has soooooooooooo high VRAM memory usage. is it really because of Overhead?

is it because of BiFPN?? or anything else??? I really want to know.

wildbrother commented 3 years ago

7~8GB VRAM is so high..

zylo117 commented 3 years ago

Can you put some breakpoints to find out which lines are causing such malloc? I've not been in this situation before. It'd be helpful if you can provide this infomation.

wildbrother commented 3 years ago

It appeared in the statement The first "features, regression, classification, anchors = model(x)" Vram up to 7~8GB in nvidia-smi ( it appeared 4GB,6GB,8Gm.. random... because the (watch) nvidia-smi couldn't catch every single nm frame)

when it is in the for-loop (_, regression, classification, anchors = model(x)). that phenomen is disappeared.

I found the variable 'x' is made with the image. and I was concerned about the VRAM increasing when I inference this code with every single different images.

I think, this statement had to re-call every time which I have to change inference data(image). So it will go up 7~8GB VRAM every-single-loop (for each images)??

I'm so sorry to give this question,but can you confirm this situation or made some inference code with multi-images with something like [img1,im2,img3] Thank you for your reply.

zylo117 commented 3 years ago

I see, it's the classification which has a large shape that consumes 4GB VRAM here during the convolution of the second feature output of the bifpn, which is weird. Because inputs has 5 features, and the first one is the largest, but the second one is causing the most of the consumption?

https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/blob/c533bc2de65135a6fe1d25ca437765c630943afb/efficientdet/model.py#L405

What's your pytorch version? I'm using 1.7 now and I don't recall pytorch 1.4 will behave like that.

wildbrother commented 3 years ago

I installed packages which you wrote in here.

torch == 1.4.0 torchvision == 0.5.0

so, it is not appered to (only)me?
I was concerned, this situation is caused by my environment (docker --ipc == host)

wildbrother commented 3 years ago

so, is this Vram up situation will be occured every single time, when put some other images in the input? <in the statement of "features, regression, classification, anchors = model(x)">

zylo117 commented 3 years ago

Hi, I found out it's the cache of this line, because I can clean it by adding torch.cuda.empty_cache() right after this line but it takes some time to empty the cache. https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/blob/c533bc2de65135a6fe1d25ca437765c630943afb/efficientdet/model.py#L44

So it won't allocate again after the first time. But it's still strange because I didn't come across this situation before.

wildbrother commented 3 years ago

ok, it takes 0.02 ~ 0.04 seconds for inference loop. deserve to do it! And now Vram's maxmum is 4GB (d3) It is half 7~8GB Great approach from one week ago. Thank you