Open aymennturki opened 2 years ago
Traceback (most recent call last):
File "projects/ISTR/train_net.py", line 136, in
how can i fix this issue i trained the code in colab and locally and still the same problem always "CUDA out of memory."
how do you solve it?,my gpu is rtx 2080ti(memory 11G).
dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False) [08/25 11:15:41 detectron2]: Contents of args.config_file=projects/ISTR/configs/ISTR-AE-R50-3x.yaml: BASE: "Base-ISTR.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" RESNETS: DEPTH: 50 STRIDE_IN_1X1: False ISTR: NUM_PROPOSALS: 300 NUM_CLASSES: 5 MASK_ENCODING_METHOD: "AE" PATH_COMPONENTS: "/content/drive/MyDrive/imenselmi/ISTR_TRAIN/ISTR/projects/AE/checkpoints/AE_112_256.t7" DATASETS: TRAIN: ("train",) TEST: ("val",) SOLVER: STEPS: (210000, 250000) MAX_ITER: 270000 INPUT: FORMAT: "RGB"
[08/25 11:15:41 detectron2]: Running with full config: CUDNN_BENCHMARK: true DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:
[08/25 11:15:41 detectron2]: Full config saved to ./output/config.yaml
pos_embeddings.weight WARNING [08/25 11:16:04 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: stem.fc.{bias, weight} [08/25 11:16:04 d2.engine.train_loop]: Starting training from iteration 0 /usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py:724: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the
args=(args,),
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/launch.py", line 82, in launch
main_func( args)
File "projects/ISTR/train_net.py", line 124, in main
return trainer.train()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward
src = self.backbone(images.tensor)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward
bottom_up_features = self.bottom_up(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward
x = stage(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward
out = self.conv3(out)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward
x = self.norm(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, *kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward
return x scale.to(out_dtype) + bias.to(out_dtype)
RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch)
geoms
property to access the constituent parts of a multi-part geometry. for poly in cropped: /usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) ERROR [08/25 11:16:05 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/usr/local/lib/python3.7/dist-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/content/drive/.shortcut-targets-by-id/190HFmYfsGdKfNWeUiqnpTgh7X3m3GFmF/ISTR_TRAIN/ISTR/projects/ISTR/istr/inseg.py", line 162, in forward src = self.backbone(images.tensor) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 449, in forward x = stage(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/modeling/backbone/resnet.py", line 201, in forward out = self.conv3(out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/wrappers.py", line 110, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/layers/batch_norm.py", line 53, in forward return x scale.to(out_dtype) + bias.to(out_dtype) RuntimeError: CUDA out of memory. Tried to allocate 672.00 MiB (GPU 0; 15.78 GiB total capacity; 13.42 GiB already allocated; 50.75 MiB free; 14.41 GiB reserved in total by PyTorch) [08/25 11:16:05 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks) [08/25 11:16:05 d2.utils.events]: iter: 0 lr: N/A max_mem: 14075M Traceback (most recent call last): File "projects/ISTR/train_net.py", line 136, in