mohitzsh / Adversarial-Semisupervised-Semantic-Segmentation

Pytorch Implementation of "Adversarial Learning For Semi-Supervised Semantic Segmentation" for ICLR 2018 Reproducibility Challenge
156 stars 45 forks source link
adversarial-networks computer-vision pytorch semantic-segmentation semi-supervised-learning

Adversarial Learning For Semi-Supervised Semantic Segmentation

Introduction

This is a submission for ICLR 2018 Reproducibility Challenge. The central theme of the work by the authors is to incorporate adversarial training for semantic-segmentation task which enables the segmentation-network to learn in a semi-supervised fashion on top of the traditional supervised learning. The authors claim significant improvement in the performance (measured in terms of mean IoU) of segmentation network after the supervised-training is extended with adversarial and semi-supervised training.

Scope

My plan is to reproduce the improvement in the performance of the segmentation network (Resnet-101) by including adversarial and semi-supervised learning scheme over the baseline supervised training and document my experience along the way. The authors have used two datasets, PASCAL VOC 12 (extended version) and Cityscapes, to demonstrate the benefits of their proposed training scheme. I will focus on PASCAL VOC 12 dataset for this work. Specifically, the target for this work is to reproduce the following table from the paper.

Method       Data Amount
1/2       full
Baseline (Resnet-101) 69.8      73.6
Baseline + Adversarial Training 72.6       74.9
Baseline + Adversarial Training +
Semi-supervised Learning
73.2       NA

Results Reproduced

Following table summarizes the results I have been able to reproduce for the full dataset. For the full dataset, only the performance of the adversarial training on top of baseline can be evaluated.

Method (Full Dataset) Original Challenge
Baseline (Resnet-101) 73.6 69.98
Baseline + Adversarial Training 74.9 70.97
Baseline + Adversarial Training +
Semi-supervised Learning
NA NA

Following table summarizes the results I was able to reproduce for the semi-supervised training where half of the training data is reserved for semi-supervised training with unlabeled data.

Method (1/2 Dataset) Original Challenge
Baseline (Resnet-101) 69.8 67.84
Baseline + Adversarial Training 72.6 68.89
Baseline + Adversarial Training +
Semi-supervised Learning
73.2 69.05

Updates

Journey

Baseline Model

Name Details mIoU
base-101 - No Normalization
- No gradient for batch norm
- Drop last batch if not complete
- Volatile = false for eval
- Poly Decay every 10 iterations
- learnable upsampling with transposed convolution
35.91
base102 Exactly like base-101, except
- no polynomial decay
- fixed bilinear upsampling layers
68.84
base103 Exactly like base-102, except
- with polynomial decay(every 10 iter))
68.88
base104 Exactly like base-103, except
-with poly decay (every iter)
69.78
base105 base-104, except
- with normalization of input to 0 mean and unit variance
68.86
base110 - ImageNet pretrained
- Normalization
- poly decay(eveyr iter)
same lr for all layers
65.97
base111 - Imagenent pretrained
- Normalization
- poly decay (every iter)
- 10x lr for classification module
65.67

Adversarial Models

Name Details miou
adv101 - base105 as G
- Optim(D): SGD lr 0.0001, momentum=0.5,decay= 0.0001
68.96
adv102 - base105
- 0.25 label smoothing for real labels in D
- Optim(D) SGD lr 0.0001, momentum=0.5,decay= 0.0001
67.14
adv103 - base105
- 0.25 label smoothing for real labels in D
- Optim(D) ADAM
68.07
adv104 - base104
- 0.25 label smoothing for real labels in D
- Optim(D) SGD lr 0.0001, momentum=0.5,decay= 0.0001
63.37
adv105 base104 as G
- everything else like adv103
Very poor (didn't finish training)
adv105-cuda - base105
- 0.25 label smoothing for real labels in D
- Optim(D) SGD lr 0.0001, momentum=0.5,decay= 0.0001
- batch size 21
Very poor (didn't finish training)
adv106 - base104
- optim(D) ADAM
- batch_size = 21
61.50
adv201 - base 105
- label smoothing 0.25
- Adam
69.33
adv202 - base105
- label smoothing 0.1
- d_optim Adam
69.93
adv203 - base105
- label smoothing 0.1
- Adam d_lr = 0.0001 and g_lr = 0.00025
69.72
adv204 - base105
- label smoothing 0.1
- Adam d_lr = 0.00001, g_lr = 0.00025
69.28