talebolano / yolov3-network-slimming

yolov3 network slimming剪枝的一种实现
345 stars 93 forks source link

always RuntimeError: CUDA error: out of memory #16

Open EtheneXiang opened 5 years ago

EtheneXiang commented 5 years ago

one GTX1070 8G 64G RAM

width= 608 height= 608 random=1 all settings are default, i do not change anything.

i have tried those combinations, all failed!!!! RuntimeError: CUDA error: out of memory batch=30 subdivisions=15

batch=15 subdivisions=5

batch=10 subdivisions=5

cococener commented 5 years ago

Me,too,and did you solve it?

EtheneXiang commented 5 years ago

Me,too,and did you solve it?

do not use "subdivisions", it does not work, so set "batch" @cococener

baigang666 commented 5 years ago

调整cfg文件中的batch和subvision参数

AbanoubMamdouh commented 5 years ago

When I change the batch number, it threw the following RunTimeError:

/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:15: UserWarning: reduction='elementwise_mean' is deprecated, please use reduction='mean' instead. warnings.warn("reduction='elementwise_mean' is deprecated, please use reduction='mean' instead.") Traceback (most recent call last): File "sparsity_train.py", line 152, in train() File "sparsity_train.py", line 98, in train loss = model(imgs, targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/abanoub/Desktop/yolov3-network-slimming-master/yolomodel.py", line 332, in forward x = torch.cat((map1, map2), 1) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 26 and 24 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

Architecture is yolov3-tiny

leJson commented 5 years ago

我在docker里训练也遇到了这个问题,解决办法是 n_cup设为0

leJson commented 5 years ago

貌似原因是docker与pytorch之间的兼容问题

AbanoubMamdouh commented 5 years ago

@leJson Could you please specify where to find the parameter "n_cup" you're talking about? Thanks in advance

leJson commented 5 years ago

@leJson Could you please specify where to find the parameter "n_cup" you're talking about? Thanks in advance

parser.add_argument("--n_cpu",dest='n_cpu',type=int,default=2,help="torch多线程核数")在这里把defualt=2改为defualt=0,也可以在训练的时候指定.

PiseyYou commented 5 years ago

@leJson it is rigth, follow this suggestion, it work. But there is another problem, in parse_config.py file, as the default, try to set options['gpus'] = '0, 1' let two GPU work together, but actually, when run python sparsity_train.py, it is only 1 gpu work, not 2 together, any advise?