mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
711 stars 142 forks source link

Why label_arr = (label+1) in crop_datasets.py #61

Closed Eric-L-Manibardo closed 1 year ago

Eric-L-Manibardo commented 1 year ago

I can not understand why 1 is added to the image labels: https://github.com/mhamilton723/STEGO/blob/e20df22cf17c41ac78e3c8c75a3118ea87ff0a4c/src/crop_datasets.py#L121 So for example, if the class 'building' of Cityscapes is (70,70,70) as RGB, the cropped label is now (71,71,71).

Is this a typo? Or there is a future step while training STEGO from scratch where you want every label to be different from the original?

tanveer6715 commented 1 year ago

Hi.. I have same question.. Did you find any reason for this?

Eric-L-Manibardo commented 1 year ago

Hello, @tanveer6715. Yes. The reason is that for training the model you do not use RGB masks but what is called "train_IDs". Here, every pixel is assigned an embedding in the form (label_id, label_id, label_id), where the 3 channels have the same value. Since cityscapes have 27 classes, these 'label_ID's range from 0 to 26. However, there is a special class for "do not evaluate" regions, such as the ego vehicle or other dynamic objects. This special class appears in pure white --> (255,255,255). By adding +1 to all labels now your training labels range from 0 to n_classes+1 and the model can be trained properly.

TL;DR the mistake was using RGB embeddings instead of 'label_id' embeddings.