hellochick / ICNet-tensorflow

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".
405 stars 153 forks source link

Own dataset training produces low loss, but unsatisfying results #39

Closed Tamme closed 6 years ago

Tamme commented 6 years ago

Hi.

I got training working and used icnet_cityscapes_trainval_90k.npy with --filter-scale=1 The loss reduces quite quickly to 0.08 in ~2000 epochs which I presumed was quite good. But when doing evaluate or inference the output is garbage as seen on the example.

b113-994_clipped b113-994_clipped

I had to change nr of classes, perhaps that could be the error or I have labaled them incorrectly?

Currently my image is shape (480, 870, 3) and gt shape is (480, 870) and gt values are in [2,5,8,10, etc] belonging to corresponding cityscapes classes.

My own one theory would be that the pretrained model is slowly transitionig to my classes and that is why the output is mixed, but that would not explain the constant low loss.

Do you have some other ideas perhaps?

Regards, Tamme

hellochick commented 6 years ago

Hey @Tamme, where did these images come from, output results generated by the model? Because it seems that the first one is much better than the second one. So your dataset is compressed version of cityscapes dataset?

Tamme commented 6 years ago

Sry I didnt mention. It is a forest dataset. The first is ground truth and second is output of icnet. and the datasets classes are a subset of cityscapes with 6 classes.

hellochick commented 6 years ago

I got it, I think the problem just as you mentioned above. You can train from the very beginning without loading pre-trained weights, since two dataset are very different.

ogail commented 6 years ago

@Tamme can you elaborate steps you did to update the number of classes so training works? Also did u ever got over the inference and loss contradiction issue you faced and mentioned above?

Tamme commented 6 years ago

@ogail, to adjust the number of classes, I modified the load function in network.py. Specifically, if the op name is 'conv6_cls' then I picked the classes the I desired.

The contradiction issue was just a bug I had in the training data. I guess you just have to go over everything and see if all makes sense.