issues with ContextNet and FastSCNN models

xiaoyufenfei / Efficient-Segmentation-Networks

Lightweight models for real-time semantic segmentationon PyTorch (include SQNet, LinkNet, SegNet, UNet, ENet, ERFNet, EDANet, ESPNet, ESPNetv2, LEDNet, ESNet, FSSNet, CGNet, DABNet, Fast-SCNN, ContextNet, FPENet, etc.)

MIT License

919 stars 164 forks source link

issues with ContextNet and FastSCNN models #10

Open ashra-main opened 4 years ago

ashra-main commented 4 years ago

I trained 7 of the models and in all of them, I got more than %80 validation mIoU with the default settings (CamVid dataset after 1000 epochs). But when I tested the 1000th checkpoints with test.py code, I get these mIoUs: CGNet_0.651, ContextNet_0.060, DABNet_0.652, EDANet_0.288, ENet_0.590, ERFNet_0.672, FastSCNN_0.011.

So I'm wondering why there's a significant difference between validation and test scores? and why ContextNet and FastSCNN checkpoints are not trained?

I have Python 3.7 and Pytorch 1.4 hope we can solve the testing issues soon

ashra-main commented 4 years ago

I figured out why there's a discrepancy between validation and training scores. the default value of train_type parameter is trainval, so it's actually training on the validation set as well. so after 1000 epochs, it was overfishing. I changed train_type to train to behave normally (I'd suggest using this as default behavior). However, still, the behavior of ContextNet and FastSCNN is not justified.

ashra-main commented 4 years ago

when I put these two models in eval() mode, the outputs become nonsense, but they are looking ok in train() mode. perhaps there's something wrong with the regularization methods!

weizhou1001 commented 4 years ago

Have you figured out the problem? I found the similar issues. I used 'train' for train_type. The training IoU and loss looked good, but when I used test.py, the results were very bad for ENet and bad for FSSNet. But the test.py is working fine for LEDNet. Not sure if there's any setting missing in the setup?