inference GPU memory weird.

zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.

GNU Lesser General Public License v3.0

5.21k stars 1.27k forks source link

inference GPU memory weird. #657

Open wildbrother opened 3 years ago

wildbrother commented 3 years ago

Hi, I have a problem with gpu memory increase

I ran your test code(inference) and gpu memory increased 8GB. (when D3) I can't use your bigger model because of this situation.

I found which statement occur this situation

스크린샷 2021-06-15 오전 4 47 12

on your model.py -> EfficientNet(nn.Module) the variable x hold the gpu memory. and this variable size goes up in the 'for loop'
// the statement : x = block(x, drop_connect_rate = drop_connect_rate) (maybe "stacking x" makes holded gpu memory size bigger)

"torch.cuda.empty_cache()" is can't clear gpu memory because the variable hold the memory.

this is the cmd print results . 스크린샷 2021-06-15 오전 4 50 32 스크린샷 2021-06-15 오전 4 50 58

I am suffered very long time because of this situation.

MY env

torch == 1.4.0 torch_vision == 0.5.0 python == 3.6 CUDA 10.2 with cudnn

maybe I can't change my python/ CUDA version because of my co-work. but the other envs are matched with the env which you wrote in this GitHub.

Plz. I need your help

zylo117 commented 3 years ago

It works fine for me. I'm now using torch 1.8.1+cu111. At very least, I can infer with D7. I'm guessing it's a bug of pytorch or cuda.

0   N/A  N/A    428708      C   /usr/bin/python3.8               4153MiB

wildbrother commented 3 years ago

ohh... I catch it. your opinion is right!..

my env was

torch == 1.4.1 torch vision == 0.5.0

when upgrade env to

torch == 1.8.1 torch vision = 0.9.1

that situation never occur..!!

I think you have to fix your README.

thank you for your fast reply. I have never seen polite writer like you in GitHub. Thanks!

wildbrother commented 3 years ago

I have two more question.

Q1. when i run d1 model train . batch_size=4 , with 4 GPU // the gpu memory is 3200mb 스크린샷 2021-06-15 오후 5 28 51

gpu 4 , batchsize 4 -> one batch for 1 gpu. and, I can't train D6 model in batch:4 with RTX TITAN X 4

Is it normal memory usage in training??

Q2. when i upgrade my torch to 1.8.1 I got a message in Val.Epoche // I'd never get an message in this phase, when my torch ver. is 1.4.1

_[W accumulategrad.h:184] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.

스크린샷 2021-06-15 오후 5 50 02

is it critical bug..?

wildbrother commented 3 years ago

It works fine for me. I'm now using torch 1.8.1+cu111. At very least, I can infer with D7. I'm guessing it's a bug of pytorch or cuda.
0   N/A  N/A    428708      C   /usr/bin/python3.8               4153MiB

bloom1123 commented 2 years ago

It works fine for me. I'm now using torch 1.8.1+cu111. At very least, I can infer with D7. I'm guessing it's a bug of pytorch or cuda.
0   N/A  N/A    428708      C   /usr/bin/python3.8               4153MiB

I have the same env, but I can't get the 32 FPS when I use the efficient_test.py, I just get 16pfs, but I don't know the reason