hszhao / semseg

Semantic Segmentation in Pytorch
MIT License
1.33k stars 244 forks source link

mIoU of PSPNet101 on PSACAL VOC 2012 #59

Open dailingjun opened 4 years ago

dailingjun commented 4 years ago

In this code, it's 0.7907(ss)/0.7963(ss). While, it's 0.826(ss) in your paper. What's the difference between them?

dailingjun commented 4 years ago

I got some information in the FAQ.md. https://github.com/hszhao/semseg/blob/master/FAQ.md

Q: Performance difference with original papers? A: Lots of details, some are listed as:

1.Pre-trained models: the used weights are different between this PyTorch codebase and former PSP/ANet Caffe version. 2.Pre-processing of images: this PyTorch codebase follows PyTorch official image pre-processing styles (normalized to 0~1 followed by subtracting mean as [0.485, 0.456, 0.406] and divided by std as [0.229, 0.224, 0.225]), while former Caffe version do normalization simply by subtracting image mean as [123.68, 116.779, 103.939]. 3.Training steps: we use training steps in Caffe version and training epochs in PyTorch for measurement. The transformed optimization steps after conversion is slightly different (e.g., in ade20k 150k with 16 batches equals to 150k*16/20210=119 epochs). 4.SGD optimization difference: see note in SGD implementation, this difference may has influences on poly style learning rate decay especially on the last steps where learning rates are very small. 5.Weight decay on biases, scale and shift of BN in two training settings, see technical reports 1, 2. 6.Label guidance: former Caffe version mainly uses 1/8 scale label guidance (former interp layer in Caffe has only CPU implementation thus we avoid using larger label guidance), the released segmentation models in this repository mainly use full scale label guidance (interpolate the final logits to original crop size for loss calculation instead of feature downsampling size as 1/8). 7..The performance variance for attention based models (e.g., PSANet) is relatively high, this can also be observed in CCNet. Besides, some low frequent classes (e.g, 'bus' in cityscapes) may also affect the performance a lot.


Can you tell us which one is the most significant?
bea-CC commented 2 years ago

Do you understand now? I have the same question