uzh-rpg / svit

Official implementation of "SViT: Revisiting Token Pruning for Object Detection and Instance Segmentation"
Apache License 2.0
23 stars 3 forks source link

GPU out of memory problem #2

Closed King4819 closed 4 months ago

King4819 commented 6 months ago

Awesome work !!! I encounter the problem of gpu out of memory, the gpu I used is single RTX4090 (24gb ram), is there any method to reduce gpu memory usage? e.g., batch size (cause I can't find the settings of batch size and I don't know the default settings of batch size)

The command I used: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

Thanks !

kaikai23 commented 6 months ago

@King4819 hello, Thank you for your interest in our work. The batch size per GPU is defined in the config file in data: samples_per_gpu. For example, it's 4 in configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py. During training, we used 4 GPUs to train the tiny model, so the effective batch size is 16: sh dist_train.sh configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py 4.

For the small model, we set samples_per_gpu=2 with 8 GPUs to keep the effective batch size still 16. Maybe you could try to use smaller samples_per_gpu with more GPUs?

Best, Yifei

King4819 commented 6 months ago

Sorry, I want to ask that is it possible to run the whole work in one single RTX4090 GPU? And which settings should I made to fit on single gpu Thanks!

kaikai23 commented 6 months ago

In short the answer is no. The ViT-Adapter is trained with default batch_size=16 (batch_per_gpu x num_gpus), and performance will greatly hurt if a much smaller batch_size is used. Sorry about that.

In case performance is not the focus, you can decrease samples_per_gpu to 2 or 1 and train with single GPU by: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

King4819 commented 6 months ago

@kaikai23 Thanks for you reply! Do you think it is possible to add gradient accumulation code to simulate large batch size on your work ? Thanks!

kaikai23 commented 6 months ago

Yes, I think so. Here are some reference here1 and here2

King4819 commented 6 months ago

@kaikai23 Sorry for the questions. I haved studied the reference, but I still confused about how to modified the code (e.g., where to add accumulative_counts ?) Is it convenient for you to commit a new code that has gradient accumulation ? Thanks you very much !

King4819 commented 6 months ago

@kaikai23 I want to ask that did you experiment the performance of different batch size during finetuning the pruned model, is it necessary to use batch size of 16 ? In other words, if I use batch size of 2 or 1, will the final performance obviously degrades ? Thanks a lot!

King4819 commented 6 months ago

It seems like mmdetection has built in gradient accumulation function that can use. Thanks !