Closed sfarkya closed 2 years ago
Looks like its working for batch size = 4 with multiple GPUs (single GPU has init_process_group error) So, is batch size = 4 reasonable or very less?
Hi, @sfarkya! Thanks for your interest in our work.
We run benchmarking vit-base with a total batch size of 64 and 32 V100 gpus (i.e., 2 images per gpu).
If your resources are limited, I recommend to use 1/4 default bsz and 1/2 default lr for training. Please also refer to https://github.com/hustvl/MIMDet/issues/3 & the 8-GPU config.
Hi @vealocia thank you for your reply. I see then the training on my side makes sense. Sure, thank you for the reference, I will check them out. Btw this the 8-GPU config is not available.
Hi @vealocia thank you for your reply. I see then the training on my side makes sense. Sure, thank you for the reference, I will check them out. Btw this the 8-GPU config is not available.
check this one
Thank you so much! I don't have access to larger GPUs but I do have access to 24 GB GPUs. I tried a batch size of 2 per GPU but the CUDA is still out of memory. Do you think I can train the model with batch-size = 1 on 8, A5000 (24 GB) GPUs? It does not give a memory error but I am concerned if the distributed updates because of batch size = 1/gpu will be good or not? Also, do you suggest any changes to the 8 GPU config in case I try to train that?
An optional strategy is using gradient checkpoint
for less GPU demands but longer training time. You can refer to this for detail.
Training with 1 img per gpu sounds feasible, maybe you can have a try and leave a comment if you find something weird.
I could train it successfully on a smaller subset of data with batch size 1 on multiple GPUs.
Hello Dear Authors,
I am trying to replicate your results for ViT benchmark model on COCO detection. I was able to successfully run the inference but I am getting a CUDA out of memory error during training on 48 GB GPU.
Here's the command I am using:
I am using using 1 GPU at the moment,
Here's the log,
I did not see any problems in the forward pass of this network during inference, Is there something I am missing?
Any help is really appreciated.