Out of memory with two GPU training

microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

https://arxiv.org/abs/2101.10979

MIT License

286 stars 44 forks source link

Out of memory with two GPU training #10

Closed fabriziojpiva closed 3 years ago

fabriziojpiva commented 3 years ago

Hi, I am trying to reproduce the results for the training process adapting GTA V to Cityscapes. I have downloaded the warm-up model and generated the soft pseudo-labels successfully, and also calculated the prototypes. But after running the script to train the stage 1, the highest mIoU that I get is 52.9.

The parameters that I have used for my current set-up (2 Nvidia RTV 2080 with 24GB each) are:

batch size = 2 learning_rate = 0.0001/2 epochs = 84 train_iters = 90000*2 I couldn't run the script with the default configuration (bs=4, lr=0.0001 and train_iters = 90000) due to out of memory. Any thoughts on how I can achieve the results with my hardware configuration?

zhangmozhe commented 3 years ago

Maybe you can try multiple forward before backward call, similar to BigGAN pytorch implementation https://github.com/ajbrock/BigGAN-PyTorch. Or, you can try mixed precision training, which may cause slight performance degradation.

tudragon154203 commented 3 years ago

Hi, I am trying to reproduce the results for the training process adapting GTA V to Cityscapes. I have downloaded the warm-up model and generated the soft pseudo-labels successfully, and also calculated the prototypes. But after running the script to train the stage 1, the highest mIoU that I get is 52.9.

The parameters that I have used for my current set-up (2 Nvidia RTV 2080 with 24GB each) are:

batch size = 2 learning_rate = 0.0001/2 epochs = 84 train_iters = 90000*2 I couldn't run the script with the default configuration (bs=4, lr=0.0001 and train_iters = 90000) due to out of memory. Any thoughts on how I can achieve the results with my hardware configuration?

Can I ask how can you get a set up with two GPUs? Do you use aAny cloud computing platform?

super233 commented 3 years ago

It was implemented by torch.nn.DataParallel and Synchronized-BatchNorm

https://github.com/microsoft/ProDA/blob/9ba80c7dbbd23ba1a126e3f4003a72f27d121a1f/models/sync_batchnorm/replicate.py#L47

yuheyuan commented 2 years ago

Hi, I am trying to reproduce the results for the training process adapting GTA V to Cityscapes. I have downloaded the warm-up model and generated the soft pseudo-labels successfully, and also calculated the prototypes. But after running the script to train the stage 1, the highest mIoU that I get is 52.9.

The parameters that I have used for my current set-up (2 Nvidia RTV 2080 with 24GB each) are:

batch size = 2 learning_rate = 0.0001/2 epochs = 84 train_iters = 90000*2 I couldn't run the script with the default configuration (bs=4, lr=0.0001 and train_iters = 90000) due to out of memory. Any thoughts on how I can achieve the results with my hardware configuration?

hi,Have you successed run the code by two GPU, I only have two 3090. I meet the same question you meet.