klauscc / TALLFormer

Apache License 2.0
50 stars 3 forks source link

Test on Thumos 14 dataset, CUDA out of memory. #11

Open caoqiushi opened 1 year ago

caoqiushi commented 1 year ago

Hi. I have a question about test on Thumos 14 dataset. There's no problem training on the Thumos 14 dataset.But, test on Thumos 14 Dataset, it report an error CUDA out of memory. The code "test.py " has include the "with torch.no_grad():".Why CUDA`s memory still gradually increasing? The following is the log. Thanks.

Evaluate checkpoint: workdir/tallformer/1.0.0-vswin_b_256x256-12GB/epoch_600_weights.pth [>>>>>>>>>>>>>>>>>>>>>>>> ] 105/212, 0.0 task/s, elapsed: 2341s, ETA: 2386sTraceback (most recent call last): File "tools/test.py", line 151, in if name == "main": File "tools/test.py", line 81, in main if not os.path.isfile(args.out): File "tools/test.py", line 58, in test result = engine(data)[0] File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedacore/parallel/data_parallel.py", line 31, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/val_engine.py", line 14, in forward return self.forward_impl(data) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/val_engine.py", line 17, in forward_impl dets = self.infer(imgs, video_metas) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 117, in infer return self._aug_infer(imgs, video_metas) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 83, in _aug_infer tdets = self._get_raw_dets(imgs, video_metas) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 36, in _get_raw_dets feats = self.extract_feats(imgs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 24, in extract_feats feats = self.model(img, train=False) File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/detectors/mem_single_stage_detector.py", line 89, in forward feats = self.forward_eval(x) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/detectors/mem_single_stage_detector.py", line 74, in forward_eval feats = self.backbone(x) File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 51, in forward return forward_x(x) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 46, in forward_x return self.forward_nochunk_inp_output(x) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 112, in forward_nochunk_inp_output x = super().forward(x) # shape: [n, c, d, h, w] File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 819, in forward x = layer(x.contiguous()) File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 532, in forward x = blk(x, attn_mask) File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 371, in forward x = self.forward_part1(x, mask_matrix, self.dummy_tensor) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 334, in forward_part1 attn_windows = self.attn(x_windows, mask=attn_mask) # BnW, WdWhWw, C File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 210, in forward attn = attn + relative_positionbias.unsqueeze(0) # B, nH, N, N RuntimeError: CUDA out of memory. Tried to allocate 4.40 GiB (GPU 0; 31.75 GiB total capacity; 20.25 GiB already allocated; 2.95 GiB free; 20.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

SimoLoca commented 1 year ago

If I'm not mistaken, it's because when training the model it will employ the video's memory bank to alleviate workload on the GPU, while when testing it will load the entire video frames, thus GPU memory increases a lot!

caoqiushi commented 1 year ago

If I'm not mistaken, it's because when training the model it will employ the video's memory bank to alleviate workload on the GPU, while when testing it will load the entire video frames, thus GPU memory increases a lot!

Thanks for your answer. But how did you solve this problem? My GPU is NVIDIA Tesla V100 SXM2. I think it should have been enough to complete the test.

klauscc commented 1 year ago

Hi @caoqiushi , you could try to decrease the batch size for inference, i.e. set samples_per_gpu=1 during inference. The default inference batch size is set to 4.

caoqiushi commented 1 year ago

Thank you for your answer. I will try it.

Hi @caoqiushi , you could try to decrease the batch size for inference, i.e. set samples_per_gpu=1 during inference. The default inference batch size is set to 4.