Closed poornimajd closed 3 years ago
When you get nan, clearly something very weird happens. I suggest checking your dataloader and class definition etc first. Seems not a problem with the network, I guess it's the problem with data (e.g. dataloader). Even a quite simple FCN should not generate this type of results. I'm sorry I might not be able to provide further help, since you are dealing with a private dataset, and I have stopped segmentation projects long time ago.
Thank you for the reply. I did not change the dataloader ,I just used the Indian Driving Dataset images in place of cityscape dataset. I will look into it. My only question which remains is that in my case I have only 4 classes,so either I should change the number of classes in the network,but then I doubt if I can use the pretrained weights(because it is trained on 19 classes),or the other hack which I did was to put 255 as the id for all the classes which I need to ignore. basically I am not sure how to handle different number of classes and classes which the model is not trained on. Any suggestion is appreciated.
I'm sorry I might not be able to provide further help, since you are dealing with a private dataset, and I have stopped segmentation projects long time ago.
I get it. Thank you
You can not simply use the pretrained weights, because class definition is totally different for your dataset. The dataloader and dataset class is highly customized for different datasets, you'll definitely need to re-write your own. Perhaps simply changing the number of classes and putting 255 for the rest will fail.
Thanks for the reply!. I have arranged my data in cityscapes format and hence the dataloader will remain the same right.But I changed the cityscape_config.json to the ids expected by the Indian Driving Dataset.I used 19 classes only this time and using the pretrained weights I still get nan.I expected that after training for a few iterations at least the model should be able to adapt to the new dataset.Most of the classes of the dataset which I am using are the same as that of cityscape,except 2 new classes.But the total number of classes still remains as 19.
I trained Shelfnet realtime model on custom dataset,with 4 classes- road,curb,sidewalk,driveable-fallback,with 4.5k images for training and 80k iterations for training.For the other classes I kept the id as 255.I used the following command for training CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=2 train.py I got the following result
I am wondering why the result is so bad.The val miou is nan Any suggestion is greatly appreciated. Thank you