wuhuikai / FastFCN

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
http://wuhuikai.me/FastFCNProject
Other
838 stars 148 forks source link

shape of the output #88

Closed anita94 closed 3 years ago

anita94 commented 4 years ago

I am training the network on 2D-3D-S dataset from Stanford university and I am getting pretty good accuracy and IoU, 81.5 and 43.6. However, the predictions are not visually good indeed.

How should I prepare the labels? Is there any need to have the output as one-hot encoded? Currently my output is an array with values from 0 to 12, as the number of classes in the dataset is 13.

wuhuikai commented 4 years ago

can you show some examples?

anita94 commented 4 years ago

Here's an example of an image and its semantic equivalent in dataset (images are equirectangular 2048x4096x3 )

image image

I have decoded the colors in the semantic image, and created a label array for each image, an array of the size (2048,4096) with values from 0 to 12 (because there are 13 classes of objects). I then have saved these labels as images, and used them as mask in the code.

Here's the ground truth and the prediction (I have enhanced the contrast 10 times, to visualize the better)

image image

I have tried to write the code for dataset, similar to the one you have for ade20k

wuhuikai commented 4 years ago
  1. Are all the predictions like this?
  2. What's the training resolution and test resolution?
anita94 commented 4 years ago
  1. Yes all predictions are like this (I only ran the test to check predictions for my validation data)
  2. All images in the dataset are stored in full high-definition at 1080 × 1080 resolution, and I used the original images for training (the dataset contains 6 areas, I had 5 of them for training and one for validation). I trained the network for 80 epochs
wuhuikai commented 4 years ago

Then, I do not have any idea about this result. It's unusual : (