hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

Gpu memory issue #35

Closed shizukadara closed 4 months ago

shizukadara commented 5 months ago

Thank you for your prompt replies. I am currently trying to run the model but this is the exact error.

Traceback (most recent call last): File "/mnt/d/Corrnet-main/main.py", line 255, in <module> processor.start() File "/mnt/d/Corrnet-main/main.py", line 67, in start seq_train(self.data_loader['train'], self.model, self.optimizer, File "/mnt/d/Corrnet-main/seq_scripts.py", line 35, in seq_train scaler.scale(loss).backward() File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.26 GiB already allocated; 0 bytes free; 5.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any advice on how to deal with less gpu capacity and resolve this issue or parameters that I can change to successfully run it? How much GPU memory does the model take overall?

hulianyuyy commented 5 months ago

The simpliest way is to set the batch size as 1, and reduce the learning rate by half correspondingly. You may also try to set only one correlation module in the network (instead of three in our paper). The overall GPU memory for batch size 2 is around 18 G in a 3090 GPU, and 11G for batch size 1.

shizukadara commented 5 months ago

The simpliest way is to set the batch size as 1, and reduce the learning rate by half correspondingly. You may also try to set only one correlation module in the network (instead of three in our paper). The overall GPU memory for batch size 2 is around 18 G in a 3090 GPU, and 11G for batch size 1.

does the phase test (pretrained model) also require gpu

hulianyuyy commented 5 months ago

The phase test requires less GPU compared to training, but still requires GPU for inference.