Open DiegoOrtego opened 7 years ago
Hello, I have trained PSPNet with default configuration. I got 80% mean IoU. However, this mean IoU seems to be category IoU not class IoU!
Could you tell me which is that configuration? I am using: args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }
Data augmentation:
train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ])
I got these results from train_coarse_extra.py Yes this is my configuration
Ok, I am using just train_fine.py but in the paper they report 0.78 of mIU. Thanks!
best record: [val loss 0.11416], [acc 0.97769], [acc_cls 0.86788], [mean_iu 0.80071], [fwavacc 0.95816], [epoch 36]
Could you copy here the training arguments and data augmentation that you are using?
args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display } train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ]) val_simul_transform = simul_transforms.Scale(train_args['input_size']) train_input_transform = standard_transforms.Compose([ standard_transforms.ToTensor(), standard_transforms.Normalize(*mean_std) ])
Ok, thanks! Any suggestion to improve performance using the fine annotated cityscapes is welcomed! I want to avoid using coarse annotations
@shahabty @DiegoOrtego Hi, I am also trying to reproduce the result, but with ResNet50.
Thanks to your shared parameters, I will run the algorithm with the following parameters. I have an extra question regarding @shahabty mentioning "this mean IoU seems to be category IoU not class IoU". According to utils/misc.py, the calculated IoU is over the number of classes defined in datasets/cityscapes.py, which is 19.
Am I correct? or did you find anything odd in the code?
args = { 'train_batch_size': 16, 'lr': 1e-2, 'lr_decay': 0.9, 'max_iter': 9e4, # the paper says 90K for Cityscape, so.. 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 16, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }
Hi, I did not find anything odd in the code. I am retraining keeping the number of iterations but not the batch size (as my GPU is limiting that). And just with the fine annotated data. This is what I am getting:
[epoch 122], [val loss 0.22252], [acc 0.93157], [acc_cls 0.70841], [mean_iu 0.62067], [fwavacc 0.87792] best record: [val loss 0.22191], [acc 0.93022], [acc_cls 0.72752], [mean_iu 0.63397], [fwavacc 0.87701], [epoch 91]
Are you training also with coarse annotations?
Regarding the learning rate adaptation depending on the batch size I found this: Quoting from "One weird trick for parallelizing convolutional neural networks" by Alex Krizhevsky: Theory suggests that when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant.
So that is why 0.01 of lr is multiplied by sqrt(16/8), I guess.
I am also training with fine dataset only! I am running with 4 Titan Black, so I am lucky to keep the batch size as 16.
I think I can share my result... in 2 days. According to my calculation, it will take 50 hours to train. Also, I am running with some of my own ideas upon ResNet50.
Great! I am using ResNet101 for PSPnet. Good luck!
It's nice that my code can help you all.
The PSPNet paper says that the model is firstly trained on the coarse dataset and then finetuned on the fine dataset. It's easy to get high mIOU (0.8+) on the coarse dataset. But I failed to reproduce the performance mentioned in the paper (I only got 0.6+ mIOU on the fine cityscapes validation set).
The PSPNet author uses a multiGPU-synchonized version of BN which is more accurate and hence beneficial to the performance. However, current version of BN in PyTorch cannot support multiGPU-synchonizing. Someone has put forward an issue about that. Please tell me if you have any other tricks that can help to improve the performace. Thanks.
Looking forward to the experiment result of @Jongchan .
Thank you for your nice code :D @ZijunDeng.
PS. Because I have to run two experiments at the same time, the batch size will be reduced to 8.. I will report when it is finished~
This issue may be of interest: https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12
@Jongchan You are right. I went through the code and everything is fine with the code. Since, this code is not parallelized @ZijunDeng , I wasn't sure about the IoU.
@DiegoOrtego It is helpful! Thanks. I am going to try the sliced prediction mentioned in https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12
@ZijunDeng Great! Tell us if you are able to improve performance! Good luck!
@DiegoOrtego Sure
Does this link helps you? http://hangzh.com/PyTorch-Encoding/syncbn.html
Someone (fmassa) also proposed this, 'What you could do is to fix the batch-norm statistics in this case, or even (and it is what most people do) replace entirely batch-norm with a fixed affine transformation.'
@aymenx17 The link is helpful! Thank you. And I plans to try the trick of freezing bn.
Has anyone tried using the link provided by @aymenx17 i.e http://hangzh.com/PyTorch-Encoding/syncbn.html ? If so, was anyone able to reproduce the accuracies provided in the paper? TIA
where did you find train_coarse_extra.py
? @shahabty
@IssamLaradji It used to be in this repo. I couldn't find it now.
fyi SyncBatchNorm added now in PyTorch master via https://github.com/pytorch/pytorch/pull/14267 For documentation, see: https://pytorch.org/docs/master/nn.html#torch.nn.SyncBatchNorm
@soumith Good news! Thanks for reminding.
when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant
@DiegoOrtego can you elaborate on this?
Hi! First of all, thank you very much for sharing this code. My doubt: I have trained PSPnet in cityscapes with your default configuration and I am getting ~0.63 mIU (fine set), which is far from the 0.78 reported in the paper. Could you give me any recommendation to approach to the paper performance? Has the batch_size = 8 (16 in the paper) any great impact in this situation?