[Memory Address Question] How to control gpu memory usage in this code?

sjg02122 commented 3 years ago

Thank you for your excellent work. I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory. Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?

Below is the full text of errors.

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker model = BtsModel(args) File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda() File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply param_applied = fn(param) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)

ygjwd12345 commented 3 years ago

The problem is CUDA out of memory. I would suggest you should crop the batch size more.

ygjwd12345 commented 3 years ago

I run in 4 V100.

ygjwd12345 commented 3 years ago

Closed due to long periods of inactivity

ygjwd12345 / TransDepth

[Memory Address Question] How to control gpu memory usage in this code? #8