skokec / segdec-net-jim2019

Surface Defect Detection with Segmentation-Decision Network on KolektorSDD
Other
135 stars 46 forks source link

Dilation for label images in 3D #11

Open sinangokce opened 4 years ago

sinangokce commented 4 years ago

This is a great job! I'm working on a dataset with multiple defects on the same object. Thus I'm annotating the defects with different colors which makes my label images be in 3D contrarily to your case in which you are working with 2D labels. I tried to adapt your dilation code by changing the dimensions of structure but as expected it doesn't dilate as I want. Have you ever tested your code with colored labels? Even if I don't need the dilation, do you think that your algorithm could work with multiple outputs instead of single binary decision output? Best!

skokec commented 4 years ago

Hi @sinangokce,

I haven't yet tried segmentation with multiple classes (i.e., the colored label). In principal, it should be possible to use it, but I guess there will be a few tricky parts of the code that will need to be adjusted. In particular, the dilation/resize of the label and loss function will need to be handled correctly.

I guess there are different ways of doing multiclass prediction:

  1. You can do multiclass classification with softmax (tf.losses.categorical_crossentropy) instead of binary cross-entropy, i.e., having [WxHx1] labels, where different integer values represent different classes. In which case, you will need to separate each class into its own single-class [WxHx1] label, then do dilation, do binarization of labels, and then combine them back into multi-class. The label is also resized using avg_pool2d and stride>1 before loss is applied, where you will need to be careful how multilabels are resizes.

  2. Another way is, if you use separate channels for each class and the label therefore becomes [WxHxCLS], where CLS=number of classes. If I understand you correctly, this is what you are doing? In this case you also need to perform dilation separately for each channel, but you do not need to binarize them and combine them. Labels that are resized using avg_pool2d should also work out-of-the-box. You can then use the same loss (binary cross entropy) as in the case of only one class.

In both version, the segmentation output needs to return/predict multiple channels and then apply appropriate loss function. You also need to decide how to handle loss in the classification output. Either you add multiple classification neurons and predict independently for each class, or you combine into one class, e.g., if any of the classes are present.

Hope this helps.

Best, Domen

sinangokce commented 4 years ago

Thank you for your very detailed reply @skokec !

What I meant by 3D labeling is that I have [WH3] formatted (RGB) label images while your label images (Part7_label.bmp (in the kos folders) for example) are in P mode (8-bit pixels, mapped to any other mode using a colour palette).

If I take only red channel and label different defects with different integers (255 for darkest red, 127 for lighter red), I think I would in the first case you explained above. Please correct me if I'm wrong.

Although I haven't thought about the second case, it may be very interesting to think about it. But I don't know how to create such an image format. For example, if we take RGBA in consideration, it has 4 channels [WH4]. Therefore, can we use it for labelling 4 different classes?

skokec commented 4 years ago

For example, if we take RGBA in consideration, it has 4 channels [WH4]. Therefore, can we use it for labelling 4 different classes?

Your are correct. Using image format would only allow you to use max 4 classes. You would have to encode that data in some other format, but this would too cumbersome, so I think option 1 would we better.

.. I have [WH3] formatted (RGB) ...

I see, your original data is basically in RGB then and each color represents different class. I think it would be much easier to handle if the labels would be in 8-bit pixels, and then have [0 = bg, 1 = cls1, 2 = cls2, etc.] (maximum of 255 different classes, for more you would need 32-bit grayscale pixels). If you can convert your data into this format then this is basically case 1 from my previous post.

Best, Domen

sinangokce commented 4 years ago

Thank you for your reply @skokec ! For binarization of labels you are taking all the pixel values which are greater than 0 and multiplying them with 255 (the constant for white pixel for P format). In my case, I will multiply each dilated label with its own pixel value i.e, 255 for darkest red and 127 for lighter red on the red channel for RGB format. Thus, I will end up with different values which are greater than 0 (127,255, etc..). Would it be still a valid binarized label for your algorithm?

skokec commented 4 years ago

If you are doing binarization on each individual classes before you combine them by multiplication with 255 or 127, then this is ok.

However, by default the algorithm will not work with such mixed labels combined into one channel out-of-the-box. You will need to rewrite the get_loss() function in segdec_model.py to get the multiclass loss and you will need to convert the label/mask into format that will be needed for your multiclass loss function.

Best, Domen