wjn922 / ReferFormer

[CVPR2022] Official Implementation of ReferFormer
Apache License 2.0
320 stars 25 forks source link

cuda OOM #5

Open opentld opened 2 years ago

opentld commented 2 years ago

platform: windows10 anaconda, RTX2080 8G

python inference_davis.py --with_box_refine --binary --freeze_text_encoder --output_dir davis_dirs/resnet50 --resume ckpt/ytvos_r50.pth --backbone resnet50 --ngpu 1

Inference only supports for batch size = 1 Namespace(a2d_path='data/a2d_sentences', aux_loss=True, backbone='resnet50', backbone_pretrained=None, batch_size=1, bbox_loss_coef=5, binary=True, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_path='data/coco', controller_layers=3, dataset_file='davis', davis_path='data/ref-davis', dec_layers=4, dec_n_points=4, device='cuda', dice_loss_coef=5, dilation=False, dim_feedforward=2048, dist_url='env://', dropout=0.1, dynamic_mask_channels=8, enc_layers=4, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, focal_alpha=0.25, freeze_text_encoder=True, giou_loss_coef=2, hidden_dim=256, jhmdb_path='data/jhmdb_sentences', lr=0.0001, lr_backbone=5e-05, lr_backbone_names=['backbone.0'], lr_drop=[6, 8], lr_linear_proj_mult=1.0, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_text_encoder=1e-05, lr_text_encoder_names=['text_encoder'], mask_dim=256, mask_loss_coef=2, masks=True, max_size=640, max_skip=3, ngpu=1, nheads=8, num_feature_levels=4, num_frames=5, num_queries=5, num_workers=4, output_dir='davis_dirs/resnet50', position_embedding='sine', pre_norm=False, pretrained_weights=None, rel_coord=True, remove_difficult=False, resume='ckpt/ytvos_r50.pth', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_dice=5, set_cost_giou=2, set_cost_mask=2, split='valid', start_epoch=0, threshold=0.5, two_stage=False, use_checkpoint=False, visualize=False, weight_decay=0.0005, with_box_refine=True, world_size=1, ytvos_path='data/ref-youtube-vos') Start inference processor 0: 0% 0/30 [00:00<?, ?it/s]Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias']


At least how much memory is required to run? or What parameters can be modified to reduce memory overhead?

Thanks!

opentld commented 2 years ago

I changed the clip_len to 8 and it worked, but when running to 47%, OOM appeared again :( @wjn922


processor 0: 47% 14/30 [06:20<06:29, 24.34s/it]Traceback (most recent call last): File "inference_davis.py", line 329, in main(args) File "inference_davis.py", line 103, in main p.run() File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run self._target(*self._args, **self._kwargs) File "inference_davis.py", line 254, in sub_processor anno_masks[anno_masks < 0.5] = 0.0 RuntimeError: CUDA out of memory. Tried to allocate 4.59 GiB (GPU 0; 8.00 GiB total capacity; 1.67 GiB already allocated; 2.85 GiB free; 3.26 GiB reserved in total by PyTorch) processor 0: 47% 14/30 [06:47<07:45, 29.09s/it]


image

opentld commented 2 years ago

After checking, it was the image of 'goldfish' that caused the OOM, so I skip objects with num_obj greater than 3. Back to the original topic, will changing clip_len to 8 reduce the precision? @wjn922

image

wjn922 commented 2 years ago

Hi,

We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach 32G.

To reduce the memory, one way is use a shorter clip like you do. Another way is to reduce the video resolution here. But these two solutions are likely to reduce the precision.

opentld commented 2 years ago

It views the language as queries and directly attends to the most relevant regions in the video frames....

How to achieve using language as queries like the gif of the homepage shows? @wjn922

wjn922 commented 2 years ago

For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer here.

AngelTang190 commented 2 years ago

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

AngelTang190 commented 2 years ago

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

I tried after resizing it to 250, the 48th video gives the CUDA OOM error, the number of expressions is 2, and the length of the video is 36. The values are not very high compared to the previous videos. What causes this to happen? Below is the result of the error:

processor 0: 24% 48/202 [04:14<18:29, 7.20s/it] Number of expressions: 2 Length of video: 36

Process Process-2: Traceback (most recent call last): File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "inference_ytvos.py", line 207, in sub_processor outputs = model([imgs], [exp], [target]) File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/referformer.py", line 321, in forward mask_features = self.pixel_decoder(features, text_features, pos, memory, nf=t) # [batch_sizetime, c, out_h, out_w] File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 258, in forward y = self.forward_features(features, text_features, pos, memory, nf) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 227, in forward_features cur_fpn = cross_attn(tgt=vision_features, File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 404, in forward return self.forward_post(tgt, memory, t, h, w, File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 337, in forward_post tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt)))) File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1206, in relu result = torch.relu(input) RuntimeError: CUDA out of memory. Tried to allocate 2.34 GiB (GPU 0; 10.75 GiB total capacity; 4.96 GiB already allocated; 1.71 GiB free; 7.29 GiB reserved in total by PyTorch) Total inference time: 255.2629 s