jerpelhan / DAVE

MIT License
29 stars 3 forks source link

CUDA out of memory #6

Open cgc-cell opened 2 months ago

cgc-cell commented 2 months ago

When I ran fscd_test.sh, there was a "CUDA out of memory" error , but I ran train_det.sh and train_sim.sh without any problem . My GPU is 3080. Is there any way to solve this problem? Thanks The details of the error are as follows:

Traceback (most recent call last):
  File "main.py", line 693, in <module>
    evaluate(args)
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "main.py", line 84, in evaluate
    out, aux, tblr, boxes_pred = model(img, bboxes, test.image_names[ids[0].item()])
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/code/DAVE/models/dave.py", line 442, in forward
    dst_mtx = self.cosine_sim(feat_pairs[None, :], feat_pairs[:, None]).cpu().numpy()
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/envs/dave/lib/python3.8/site-packages/torch/nn/modules/distance.py", line 77, in forward
    return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: CUDA out of memory. Tried to allocate 5.96 GiB (GPU 2; 23.69 GiB total capacity; 12.26 GiB already allocated; 3.46 GiB free; 18.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
cgc-cell commented 2 months ago

The parameter is

python3 ../main.py \
--skip_train \
--model_name DAVE_3_shot \
--backbone resnet50 \
--swav_backbone \
--reduction 8 \
--num_enc_layers 3 \
--num_dec_layers 3 \
--kernel_dim 3 \
--emb_dim 256 \
--num_objects 3 \
--num_workers 1 \
--use_query_pos_emb \
--use_objectness \
--use_appearance \
--batch_size 1 \
--pre_norm
jerpelhan commented 2 months ago

Yes, this is normal. The GPU memory consumption of the presented method varies for images with different sizes of objects. This issue can be solved by using a test time cropping approach (see CounTR: https://github.com/Verg-Avesta/CounTR), which we did not implement since it is only a technical adjustment and was already done. You can cut the image into parts after it is sent through the backbone and batch the cropped images to perform all other operations on each cropped image separately. This way, the achieved results should be similar.

lisenjie757 commented 2 months ago

I also find when I run demo.py for inference it costs about 10 GB GPU memory (my gpu is 3090), but when I use CounTR it just costs less than 1GB GPU memory. Is it normal? Why is there such a big difference and how to solve it?

jerpelhan commented 2 months ago

Yes, when the given exemplar objects are small, the image is upscaled, which uses more RAM. This can be addressed using a tiling approach. An easier solution is to remove the upscaling, though this will reduce performance. To do this, set scale_x and scale_y to 1.0 in data.utils at line 36.