cuda OOM - Githubissues

opentld commented 2 years ago

platform: windows10 anaconda, RTX2080 8G

python inference_davis.py --with_box_refine --binary --freeze_text_encoder --output_dir davis_dirs/resnet50 --resume ckpt/ytvos_r50.pth --backbone resnet50 --ngpu 1

Inference only supports for batch size = 1 Namespace(a2d_path='data/a2d_sentences', aux_loss=True, backbone='resnet50', backbone_pretrained=None, batch_size=1, bbox_loss_coef=5, binary=True, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_path='data/coco', controller_layers=3, dataset_file='davis', davis_path='data/ref-davis', dec_layers=4, dec_n_points=4, device='cuda', dice_loss_coef=5, dilation=False, dim_feedforward=2048, dist_url='env://', dropout=0.1, dynamic_mask_channels=8, enc_layers=4, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, focal_alpha=0.25, freeze_text_encoder=True, giou_loss_coef=2, hidden_dim=256, jhmdb_path='data/jhmdb_sentences', lr=0.0001, lr_backbone=5e-05, lr_backbone_names=['backbone.0'], lr_drop=[6, 8], lr_linear_proj_mult=1.0, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_text_encoder=1e-05, lr_text_encoder_names=['text_encoder'], mask_dim=256, mask_loss_coef=2, masks=True, max_size=640, max_skip=3, ngpu=1, nheads=8, num_feature_levels=4, num_frames=5, num_queries=5, num_workers=4, output_dir='davis_dirs/resnet50', position_embedding='sine', pre_norm=False, pretrained_weights=None, rel_coord=True, remove_difficult=False, resume='ckpt/ytvos_r50.pth', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_dice=5, set_cost_giou=2, set_cost_mask=2, split='valid', start_epoch=0, threshold=0.5, two_stage=False, use_checkpoint=False, visualize=False, weight_decay=0.0005, with_box_refine=True, world_size=1, ytvos_path='data/ref-youtube-vos') Start inference processor 0: 0% 0/30 [00:00<?, ?it/s]Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias']

This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). number of params: 51394175 D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.) return torch.floor_divide(self, other) Traceback (most recent call last): File "inference_davis.py", line 330, in main(args) File "inference_davis.py", line 103, in main p.run() File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run self._target(*self._args, self._kwargs) File "inference_davis.py", line 224, in sub_processor outputs = model([imgs], [exp], [target]) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "D:\SourceCodes\Transformers\ReferFormer\models\referformer.py", line 286, in forward self.transformer(srcs, text_embed, masks, poses, query_embeds) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 170, in forward memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 291, in forward output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 261, in forward src = self.forward_ffn(src) File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 248, in forward_ffn src2 = self.linear2(self.dropout2(self.activation(self.linear1(src)))) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\linear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA out of memory. Tried to allocate 1.43 GiB (GPU 0; 8.00 GiB total capacity; 3.75 GiB already allocated; 691.50 MiB free; 5.43 GiB reserved in total by PyTorch) processor 0: 0% 0/30 [00:23<?, ?it/s]

At least how much memory is required to run? or What parameters can be modified to reduce memory overhead?

Thanks!

opentld commented 2 years ago

I changed the clip_len to 8 and it worked, but when running to 47%, OOM appeared again :( @wjn922

processor 0: 47% 14/30 [06:20<06:29, 24.34s/it]Traceback (most recent call last): File "inference_davis.py", line 329, in main(args) File "inference_davis.py", line 103, in main p.run() File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run self._target(*self._args, **self._kwargs) File "inference_davis.py", line 254, in sub_processor anno_masks[anno_masks < 0.5] = 0.0 RuntimeError: CUDA out of memory. Tried to allocate 4.59 GiB (GPU 0; 8.00 GiB total capacity; 1.67 GiB already allocated; 2.85 GiB free; 3.26 GiB reserved in total by PyTorch) processor 0: 47% 14/30 [06:47<07:45, 29.09s/it]

opentld commented 2 years ago

After checking, it was the image of 'goldfish' that caused the OOM, so I skip objects with num_obj greater than 3. Back to the original topic, will changing clip_len to 8 reduce the precision? @wjn922

wjn922 commented 2 years ago

Hi,

We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach 32G.

To reduce the memory, one way is use a shorter clip like you do. Another way is to reduce the video resolution here. But these two solutions are likely to reduce the precision.

opentld commented 2 years ago

It views the language as queries and directly attends to the most relevant regions in the video frames....

How to achieve using language as queries like the gif of the homepage shows? @wjn922

wjn922 commented 2 years ago

For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer here.

AngelTang190 commented 2 years ago

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

AngelTang190 commented 2 years ago

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

I tried after resizing it to 250, the 48th video gives the CUDA OOM error, the number of expressions is 2, and the length of the video is 36. The values are not very high compared to the previous videos. What causes this to happen? Below is the result of the error:

processor 0: 24% 48/202 [04:14<18:29, 7.20s/it] Number of expressions: 2 Length of video: 36

Process Process-2: Traceback (most recent call last): File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "inference_ytvos.py", line 207, in sub_processor outputs = model([imgs], [exp], [target]) File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/referformer.py", line 321, in forward mask_features = self.pixel_decoder(features, text_features, pos, memory, nf=t) # [batch_sizetime, c, out_h, out_w] File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 258, in forward y = self.forward_features(features, text_features, pos, memory, nf) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 227, in forward_features cur_fpn = cross_attn(tgt=vision_features, File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 404, in forward return self.forward_post(tgt, memory, t, h, w, File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 337, in forward_post tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt)))) File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1206, in relu result = torch.relu(input) RuntimeError: CUDA out of memory. Tried to allocate 2.34 GiB (GPU 0; 10.75 GiB total capacity; 4.96 GiB already allocated; 1.71 GiB free; 7.29 GiB reserved in total by PyTorch) Total inference time: 255.2629 s

wjn922 / ReferFormer

cuda OOM #5