leeesangwon / PyTorch-Image-Retrieval

A PyTorch framework for an image retrieval task including implementation of N-pair Loss (NIPS 2016) and Angular Loss (ICCV 2017).
MIT License
275 stars 55 forks source link

RuntimeError: CUDA out of memory. Tried to allocate #10

Closed zhiweige closed 5 years ago

zhiweige commented 5 years ago

Thanks for your great work on metric learning. I have a question during I run the code. I run the train code on cub200 data set, the script for the training is: CUDA_VISIBLE_DEVICES='0' python main.py --model inceptionv3 --mode train --dataset-path ../../train_data/CUB_train_test/ --scheduler StepLR --input-size 299 --loss-type angular --model-save-dir ./models --num-classes 32

And I met the OOM error. lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 492.88 MiB (GPU 0; 22.38 GiB total capacity; 13.54 GiB already allocated; 289.06 MiB free; 8.84 MiB cached)

Have you met this problem? Thanks.

leeesangwon commented 5 years ago

When we used 2 K80 GPUs for the training with an input size of 288, num-classes of 42 and densenet161, there was no out of memory error. It might help to use smaller input_size, num_classes or base model. Plus, because Angular loss only requires 2 samples per class for a batch sampler, you might save the memory for data loader by assigning the --num-samples 2 (current default is 4).

zhiweige commented 5 years ago

Thanks for your reply ^^ I used the inceptionv3 model when I tried to trian the network, maybe it caused the memory issue, I will change the parameters according to your suggestions .