layumi / AICIty-reID-2020

:red_car: The 1st Place Submission to AICity Challenge 2020 re-id track (Baidu-UTS submission)
MIT License
455 stars 108 forks source link

CUDA out of memory #33

Closed vokhidovhusan closed 3 years ago

vokhidovhusan commented 3 years ago

I am running fast_submit_baidu.py file. I have set --batchsize 1. But i am getting an error.

python fast_submit_baidu.py --gpu_ids 0,1 --batchsize 1 ./data/test_data torch.Size([1052, 9718]) Gallery Cluster Class Number: 798 Gallery Cluster Image per Class: 22.92 Low Qualtiy Image in Query: 92 Low Qualtiy Image in Gallery: 2760 torch.Size([1052, 2048])

RuntimeError: CUDA out of memory. Tried to allocate 1.39 GiB (GPU 0; 7.80 GiB total capacity; 5.19 GiB already allocated; 113.00 MiB free; 5.59 GiB reserved in total by PyTorch)

It says `CUDA out of memory. Could you please anyone give me gaudiness?! How can i solve this problem?

layumi commented 3 years ago

Hi @martianvenusian Since I use the GPU with large memory, I do not optimize the code.

You may set the stopping point to see which line goes out the memory.

One easy way is to del xxx if you do not use this variant xxx any more.

vokhidovhusan commented 3 years ago

Hi @martianvenusian Since I use the GPU with large memory, I do not optimize the code.

You may set the stopping point to see which line goes out the memory.

One easy way is to del xxx if you do not use this variant xxx any more.

I don't quite understand when you say del xxx. Could please specific?

I am using two 8GB GPUs. I have tried with batchsize 1 but the same error occurred. I am trying to understand your code. But I don't understand why I am still getting CUDA out of memory error. The error is happening in line 453 as following:

torch.Size([1052, 2048]) torch.Size([19342, 19342]) Traceback (most recent call last): File "fast_submit_baidu.py", line 453, in cam_dist = 2 - 2*cam_dist File "/home/username/.virtualenvs/re-id/lib/python3.8/site-packages/torch/tensor.py", line 511, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: CUDA out of memory. Tried to allocate 1.39 GiB (GPU 0; 7.80 GiB total capacity; 5.19 GiB already allocated; 595.38 MiB free; 5.59 GiB reserved in total by PyTorch)

Thanks a lot.

layumi commented 3 years ago

Thanks, @miraclebiu You may refer to this to optimize the code to meet the memory demand.

https://sagivtech.com/2017/09/19/optimizing-pytorch-training-code/

vokhidovhusan commented 3 years ago

Thanks, @miraclebiu You may refer to this to optimize the code to meet the memory demand.

https://sagivtech.com/2017/09/19/optimizing-pytorch-training-code/

@layumi Thank you. I'll check the link for optimization. I have tried other GPU with larger memory and it runs well without CUDA out of memory error