mapillary / seamseg

Seamless Scene Segmentation
BSD 3-Clause "New" or "Revised" License
289 stars 53 forks source link

RuntimeError: CUDA out of memory. Tried to allocate 464.00 MiB (GPU 1; 10.92 GiB total capacity; 8.43 GiB already allocated; 323.50 MiB free; 1.56 GiB cached) #15

Closed xiong224 closed 4 years ago

xiong224 commented 4 years ago

Traceback (most recent call last): File "train_panoptic.py", line 628, in main(parser.parse_args()) File "train_panoptic.py", line 589, in main global_step=global_step, loss_weights=config["optimizer"].getstruct("loss_weights")) File "trainpanoptic.py", line 296, in train losses, , conf = model(batch, do_loss=True, do_prediction=False) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 376, in forward output = self.module(inputs[0], kwargs[0]) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/opt/conda/lib/python3.6/site-packages/seamseg-0.1.dev31+g18e28cc-py3.6-linux-x86_64.egg/seamseg/models/panoptic.py", line 76, in forward self.rpn_head, x, bbx, iscrowd, valid_size, training=self.training, do_inference=True) File "/opt/conda/lib/python3.6/site-packages/seamseg-0.1.dev31+g18e28cc-py3.6-linux-x86_64.egg/seamseg/algos/fpn.py", line 84, in training obj_logits, bbx_logits, h, w = self._get_logits(head, x) File "/opt/conda/lib/python3.6/site-packages/seamseg-0.1.dev31+g18e28cc-py3.6-linux-x86_64.egg/seamseg/algos/fpn.py", line 57, in _get_logits obj_logits_i, bbx_logits_i = head(x_i) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/opt/conda/lib/python3.6/site-packages/seamseg-0.1.dev31+g18e28cc-py3.6-linux-x86_64.egg/seamseg/modules/heads/rpn.py", line 62, in forward x = self.conv1(x) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 464.00 MiB (GPU 1; 10.92 GiB total capacity; 8.43 GiB already allocated; 323.50 MiB free; 1.56 GiB cached) ^CTraceback (most recent call last): File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in main() File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 228, in main process.wait() File "/opt/conda/lib/python3.6/subprocess.py", line 1477, in wait (pid, sts) = self._try_wait(0) File "/opt/conda/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags)

when I run "python -m torch.distributed.launch --nproc_per_node=2 train_panoptic.py --log_dir LOG_DIR configurations/cityscapes_r50.ini ../dataset_root/" on two 2080ti GPU of 12G memory and set the batch_size as 1 for both training and validation, it always show CUDA out of memory. If the code only run on the GPU of 16G memory.

ducksoup commented 4 years ago

The original settings from our paper, which the given cityscapes_r50.ini reproduces, are calibrated for a setup of 8x Nvidia V100 GPUs with 32GB of memory each. To be able to train on 12GB per card you need to reduce the training image size by properly setting the shortest_size and longest_max_size parameters to lower values.