sacmehta / ESPNet

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
https://sacmehta.github.io/ESPNet/
MIT License
541 stars 112 forks source link

how about the performance? #9

Closed wldeephi closed 6 years ago

wldeephi commented 6 years ago

@sacmehta ,thanks very much for your work, and I wonder about the performance(mIOU) on cityscapes validation set of the ESP_C model and ESPnet model? I run your commands, and the mIOU only 41% which is much lower than the performance released on the paper ? Thanks very much !

sacmehta commented 6 years ago

Something is wrong with your evaluation.

We achieve mIOU of 53.3 and 61.4 for ESPNet-C and ESPNet on the Cityscapes validation set. Are you using the evaluation scripts provided by the Cityscapes dataset?

wldeephi commented 6 years ago

I just using the evaluation scripts provided by the Cityscapes dataset, and the mIOU is 43.5% with your released ESP_C model(p=2, q=8), can you give me some advice? Thansk very much

sacmehta commented 6 years ago

I am assuming that you are using VisualizeResults.py file in the test folder. This file generates the label images at a resolution of 1024x512.

To use Cityscapes scripts, you need to upsample these generated images by a factor of 2 so that your label image size is the same as the input image i.e. 2048x1024. Could you please tell me how are you up sampling your label images to get to this size?

wldeephi commented 6 years ago

I just using VisualizeResults.py file with input_size=2048x1024, I wonder is it same with the 1024x512 and then upsampling with factor of 2?

sacmehta commented 6 years ago

We never trained/tested our models at this high resolution because it will demand enormous resources which are not available on embedded devices. Could you try generating the results at 1024x512 resolution and see if you are able to generate the reported numbers?

Note 1: if you are upsampling segmentation masks, then please use nearest neighbor interpolation.

Note 2: if you are upsampling the feature maps of last layer, then use Bilinear interpolation and then apply softmax to get the final feature maps.

P.S. you can finetune ESPNet models at high resolution. I believe fine tuning at high resolutions will further improve its accuracy.

wldeephi commented 6 years ago

OK, thanks very much for you real time answers and very useful advices. I will do experiments following your suggestions and then share results later。

hsyi commented 6 years ago

May I ask about your mIOU during training? I can get only 0.47 mIOU in splited validation dataset and 0.50mIOU in train dataset during training .

sacmehta commented 6 years ago

Which scripts are you using to evaluate mIOU?

hsyi commented 6 years ago

I have not evaluate the model on test set of cityscapes dataset, the data of miou is from trainlog. image image

sacmehta commented 6 years ago

This looks good. Train it for 300 epochs and you should be Okay.

Note: official mIOU metric used for evaluation on the Cityscapes is different than the one which we have in our code. Please evaluate your best model (with min validation loss) on the Cityscapes validation set using their scripts to compare the number reported in paper.

hsyi commented 6 years ago

thank you for your quickly answering, I'm waiting for the 300 epochs.

sacmehta commented 6 years ago

For our numbers on the Cityscapes validation set, please see Table 2(f) in the paper which reports the data for both ESPNet-C as well as ESPNet.

Good luck and thanks for showing interest in our work!

hsyi commented 6 years ago

hi, this is my full train log, I use the pretraind encoder in your github to train ESPNet, but It seems that I can't get the good performance as you. Do u have any advices for training? how to tune the hyperparameters? trainValLog.txt

sacmehta commented 6 years ago

Hi,

This looks good to me. Please evaluate it using the Cityscapes dataset scripts because they are different as they use weighted mIOU; different from the one we provide. Once you evaluate on that, you will see similar performance.

Note that please use your best model for generating results. That is, the model with least validation loss.

hsyi commented 6 years ago

Thank you @sacmehta , I'll try tomorrow.It's really nice of you ^ _ ^

hsyi commented 6 years ago

hello, this is my result with bilinear interpolation on val set , And I think it's worse than you image

sacmehta commented 6 years ago

Could you please tell me if you use ESpNet-C or ESPNet?

hsyi commented 6 years ago

ESPNet,I use the pretrained ESPnet-c which you provided in the code to train the model

sacmehta commented 6 years ago

Which configuration?

hsyi commented 6 years ago

all hyperparameters are default value in your code. Do you have any advice?

sacmehta commented 6 years ago

Can you share the command you used to train the model?

hsyi commented 6 years ago

python main.py look at this image

sacmehta commented 6 years ago

You didn’t turn on the decoder flag. you end up training ESPNet-C and not ESPNet.

Please use your current model as pretrained encoder (instead of ours) and train it with decoding flag.

hsyi commented 6 years ago

sorry I didn't remember whether did I changed the setting during training with the command -decoder=True, but I'll figure it out.

hsyi commented 6 years ago

yeah,in the model dict ,there are two up layer. and the folder to contain the result is "results_enc__dec_2_8" So I think I have turn it on during training with the command line. Do you have any suggestions?

hsyi commented 6 years ago

but the size of parameters is different. I must have done something wrong. Thank you for your help. I'm going to retrain one.

hsyi commented 6 years ago

I believe this is my final result for ESPNet, I have checked the number of "parameters" in train log, and it's right for ESPNet. May I ask u for any advice for hyperparameters? are you using the default setting in your code to train your model?

sacmehta commented 6 years ago

Try batch size of 6 or 8.

hsyi commented 6 years ago

image hi,this is the evaluated result in validation set on cityscapes val dataset.(with bilinear resize to 2048*1024) I use your espnet_p2_q8 with decoder provided in the code . Did you released your best model? or whether I've done something wrong?

sacmehta commented 6 years ago

Provided model is our best model on the validation set.

Did you resize feature maps or segmentation masks using Bilinear interpolation?

sacmehta commented 6 years ago

Also, check unique values in the generated segmentation masks. They should be between 0 and number of classes. You can check this using

numpy.unique

hsyi commented 6 years ago

yeah I found it I resized the feature map I should upsample the model out it's work now thank you !

sacmehta commented 6 years ago

Are you able to attain the reported accuracy?

hsyi commented 6 years ago

image yes

hsyi commented 6 years ago

hi, @sacmehta ,have you done something to augment the original dataset, I can only get 0.59 below your 0.61 on cityscapes validation set

sacmehta commented 6 years ago

No, we didn’t use any additional augmentation.

+/- 2 points deviation is kind of expected. We used a batch size of 12 for ESPNet-c and 6 for ESPNet. What batch size did you use?

hsyi commented 6 years ago

same as you, 6 for ESPNet . thank you very much!

acgtyrant commented 6 years ago

@sacmehta Is it necessary to train 300 epochs? The val loss and mIoU are sluggish after the early epoch.

acgtyrant commented 6 years ago

My result is val mIoU 0.601.