sail-sg / volo

VOLO: Vision Outlooker for Visual Recognition
Apache License 2.0
931 stars 94 forks source link

Increasing GPU memory in every epoch when running volo-d2 without token labeling. #26

Open Ree1s opened 3 years ago

Ree1s commented 3 years ago

Hi, thanks for sharing volo, a nice work. I used bash''' export CUDA_VISIBLE_DEVICES=1,4,5,6 python -m torch.distributed.launch --nproc_per_node=4 main.py "path/to/dataset" \ --model volo_dd2 --img-size 224 \ -b 100 --lr 1.0e-3 --drop-path 0.2 --epoch 300 --native-amp \ --finetune ./d2_224_85.2.pth.tar GPU memory was increasing when I trained volo-d2 with pretrained model and no token labeling on my own dataset. I added no trick on it and after about 15 epoch it was nearly out of the memory.

zihangJiang commented 3 years ago

It's a common issue. Similar to https://github.com/rwightman/pytorch-image-models/issues/80. Can you try to add --no-prefetcher flag to see if it solves the problem?

CreamNuts commented 3 years ago

I had same issue, but thanks to @zihangJiang I solved it. Do you know why it happen?