lxx1991 / VS-ReID

Video Object Segmentation with Re-identification
BSD 2-Clause "Simplified" License
292 stars 65 forks source link

How much memory for running davis_test (Out of Memory) #9

Closed wattanapong closed 5 years ago

wattanapong commented 5 years ago

I have tried to run davis_test by following your instruction. I would like to know, how much memory this one need for running? My GPU is 1080ti x2

I have encountered the problem out of memory at predict function, not even changing frame_fr_dir from full-resolution to training images. The error is:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "davis_test.py", line 518, in main() File "davis_test.py", line 421, in main predict(1, frames_num, 1, range(instance_num)) File "davis_test.py", line 88, in predict prob = model(image_patch, flow_patch, warp_label_patch) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/home/watt/workspace/pytorch/vs-reid/core/models/MP2S.py", line 131, in forward x = self.backbone(x, p) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/watt/workspace/pytorch/vs-reid/core/models/backbones/sense_resnet.py", line 150, in forward x = self.layer4(x) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/home/watt/workspace/pytorch/vs-reid/core/models/backbones/sense_resnet.py", line 73, in forward residual = self.downsample(x) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 49, in forward self.training or not self.track_running_stats, self.momentum, self.eps) File "/home/watt/anaconda2/envs/emask/lib/python3.6/site-packages/torch/nn/functional.py", line 1194, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.cu:58

qchenclaire commented 5 years ago

I ran the test without changing anything. I used 4 Titan XP, each taking about 1-2 GB.

wattanapong commented 5 years ago

@qchenclaire Thank you for your response. I can't run this code on multiple GPU. Now, I changed this line of code from instance_num = label_0.max() to


for i in range( len(uniq_lbl) ):
    label_0[label_0==uniq_lbl[i]] = i
instance_num = len(uniq_lbl)-1```
After that I can run this code by GPU memory usage  less than 3000MB