PSPnet performance cityscapes

DiegoOrtego commented 7 years ago

Hi! First of all, thank you very much for sharing this code. My doubt: I have trained PSPnet in cityscapes with your default configuration and I am getting ~0.63 mIU (fine set), which is far from the 0.78 reported in the paper. Could you give me any recommendation to approach to the paper performance? Has the batch_size = 8 (16 in the paper) any great impact in this situation?

shahabty commented 7 years ago

Hello, I have trained PSPNet with default configuration. I got 80% mean IoU. However, this mean IoU seems to be category IoU not class IoU!

DiegoOrtego commented 7 years ago

Could you tell me which is that configuration? I am using: args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

Data augmentation:

train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ])

shahabty commented 7 years ago

I got these results from train_coarse_extra.py Yes this is my configuration

DiegoOrtego commented 7 years ago

Ok, I am using just train_fine.py but in the paper they report 0.78 of mIU. Thanks!

shahabty commented 7 years ago

best record: [val loss 0.11416], [acc 0.97769], [acc_cls 0.86788], [mean_iu 0.80071], [fwavacc 0.95816], [epoch 36]

DiegoOrtego commented 7 years ago

Could you copy here the training arguments and data augmentation that you are using?

shahabty commented 7 years ago

args = { 'train_batch_size': 8, 'lr': 1e-2 / (math.sqrt(16. / 8)), 'lr_decay': 0.9, 'max_iter': 10e4, 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 8, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display } train_simul_transform = simul_transforms.Compose([ simul_transforms.RandomSized(train_args['input_size']), simul_transforms.RandomRotate(10), simul_transforms.RandomHorizontallyFlip() ]) val_simul_transform = simul_transforms.Scale(train_args['input_size']) train_input_transform = standard_transforms.Compose([ standard_transforms.ToTensor(), standard_transforms.Normalize(*mean_std) ])

DiegoOrtego commented 7 years ago

Ok, thanks! Any suggestion to improve performance using the fine annotated cityscapes is welcomed! I want to avoid using coarse annotations

Jongchan commented 7 years ago

@shahabty @DiegoOrtego Hi, I am also trying to reproduce the result, but with ResNet50.

Thanks to your shared parameters, I will run the algorithm with the following parameters. I have an extra question regarding @shahabty mentioning "this mean IoU seems to be category IoU not class IoU". According to utils/misc.py, the calculated IoU is over the number of classes defined in datasets/cityscapes.py, which is 19.

Am I correct? or did you find anything odd in the code?

args = { 'train_batch_size': 16, 'lr': 1e-2, 'lr_decay': 0.9, 'max_iter': 9e4, # the paper says 90K for Cityscape, so.. 'input_size': 340, 'weight_decay': 1e-4, 'momentum': 0.9, 'snapshot': '', # empty string denotes learning from scratch 'print_freq': 20, 'val_batch_size': 16, 'val_save_to_img_file': False, 'val_img_sample_rate': 0.1 # randomly sample some validation results to display }

DiegoOrtego commented 7 years ago

Hi, I did not find anything odd in the code. I am retraining keeping the number of iterations but not the batch size (as my GPU is limiting that). And just with the fine annotated data. This is what I am getting:

[epoch 122], [val loss 0.22252], [acc 0.93157], [acc_cls 0.70841], [mean_iu 0.62067], [fwavacc 0.87792] best record: [val loss 0.22191], [acc 0.93022], [acc_cls 0.72752], [mean_iu 0.63397], [fwavacc 0.87701], [epoch 91]

Are you training also with coarse annotations?

DiegoOrtego commented 7 years ago

Regarding the learning rate adaptation depending on the batch size I found this: Quoting from "One weird trick for parallelizing convolutional neural networks" by Alex Krizhevsky: Theory suggests that when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant.

So that is why 0.01 of lr is multiplied by sqrt(16/8), I guess.

Jongchan commented 7 years ago

I am also training with fine dataset only! I am running with 4 Titan Black, so I am lucky to keep the batch size as 16.

I think I can share my result... in 2 days. According to my calculation, it will take 50 hours to train. Also, I am running with some of my own ideas upon ResNet50.

DiegoOrtego commented 7 years ago

Great! I am using ResNet101 for PSPnet. Good luck!

zijundeng commented 7 years ago

It's nice that my code can help you all.

The PSPNet paper says that the model is firstly trained on the coarse dataset and then finetuned on the fine dataset. It's easy to get high mIOU (0.8+) on the coarse dataset. But I failed to reproduce the performance mentioned in the paper (I only got 0.6+ mIOU on the fine cityscapes validation set).

The PSPNet author uses a multiGPU-synchonized version of BN which is more accurate and hence beneficial to the performance. However, current version of BN in PyTorch cannot support multiGPU-synchonizing. Someone has put forward an issue about that. Please tell me if you have any other tricks that can help to improve the performace. Thanks.

Looking forward to the experiment result of @Jongchan .

Jongchan commented 7 years ago

Thank you for your nice code :D @ZijunDeng.

PS. Because I have to run two experiments at the same time, the batch size will be reduced to 8.. I will report when it is finished~

DiegoOrtego commented 7 years ago

This issue may be of interest: https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

shahabty commented 7 years ago

@Jongchan You are right. I went through the code and everything is fine with the code. Since, this code is not parallelized @ZijunDeng , I wasn't sure about the IoU.

zijundeng commented 7 years ago

@DiegoOrtego It is helpful! Thanks. I am going to try the sliced prediction mentioned in https://github.com/Vladkryvoruchko/PSPNet-Keras-tensorflow/issues/12

DiegoOrtego commented 7 years ago

@ZijunDeng Great! Tell us if you are able to improve performance! Good luck!

zijundeng commented 7 years ago

@DiegoOrtego Sure

aymenx17 commented 7 years ago

Does this link helps you? http://hangzh.com/PyTorch-Encoding/syncbn.html

Someone (fmassa) also proposed this, 'What you could do is to fix the batch-norm statistics in this case, or even (and it is what most people do) replace entirely batch-norm with a fixed affine transformation.'

zijundeng commented 7 years ago

@aymenx17 The link is helpful! Thank you. And I plans to try the trick of freezing bn.

rohitgajawada commented 6 years ago

Has anyone tried using the link provided by @aymenx17 i.e http://hangzh.com/PyTorch-Encoding/syncbn.html ? If so, was anyone able to reproduce the accuracies provided in the paper? TIA

IssamLaradji commented 6 years ago

where did you find train_coarse_extra.py ? @shahabty

shahabty commented 6 years ago

@IssamLaradji It used to be in this repo. I couldn't find it now.

soumith commented 5 years ago

fyi SyncBatchNorm added now in PyTorch master via https://github.com/pytorch/pytorch/pull/14267 For documentation, see: https://pytorch.org/docs/master/nn.html#torch.nn.SyncBatchNorm

zijundeng commented 5 years ago

@soumith Good news! Thanks for reminding.

mrgloom commented 4 years ago

when multiplying the batch size by k, one should multiply the learning rate by sqrt(k) to keep the variance in the gradient expectation constant

@DiegoOrtego can you elaborate on this?

zijundeng / pytorch-semantic-segmentation

PSPnet performance cityscapes #6