Open alexlopezcifuentes opened 3 years ago
Hi Alex,
Are you using batch size of 16? This is very important.
Did you test the pretrained model using this script?
https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained
Hi!
Unfortunately, my GPU does not have enough memory to fit a batch size of 16, so I'm trying to simulate it by using gradient accumulation. I suppose that is the main problem, I was asking in case I have missed something else.
I do use your testing script (https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html#test-pretrained).
So I assume that the only problem is the batch size which is a problem with nearly no solution...
You may try PyTorch checkpoint
option, which saves memory usage.
Thanks for the suggestion. I tried it and although it saves GPU memory the performance of the final model is worse to the one train with lower batch size.
Can I ask you which GPU did you used to train the model, and how much memory did it have? I want to approximately know how many memory I'll need.
For the experiments in the paper, I used AWS EC2 P3.24dn instance with 8x 32GB V100 gpus, but may not be necessary. 16GB per gpu should be enough for most of the experiments.
Hi Hang Zhang!
First I want to thank you for the amazing repository.
I'm trying to train DeepLabv3 with ResNeSt-101 backbone (DeepLab_ResNeSt101_PContext) for the task of semantic segmentation in Pascal Context Dataset. I'm running the code without any issue, however, I'm still under you results from the pre-trained model that you provide in https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html :
I'm using the exact same hyperparameters as you and using the following training command:
python train.py --dataset pcontext --model deeplab --aux --backbone resnest101
Is there something that I'm missing for reaching you results? I assume that your model is trained using Auxiliary Loss but not Semantic Encoding Loss. Are you using some pretraining data maybe?
Thanks in advance!
Alex.