pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.21k stars 6.95k forks source link

num_classes in segmentation examle #5822

Open chris-tkinter opened 2 years ago

chris-tkinter commented 2 years ago

Hi, I have some questions about num_classes in the segmentation example.

Since the pretraining classes is 21, should the target mask be integers from 0 to 21, where 0 is the background?

If so, my next question is do we need to compute loss over 0 (background)? I think generally background pixels is the majority, will it hurt the model so that the model would always predict 0 (background) as the majority? Or should we ignore background label by passing ignore_index=0 in cross entropy loss? I see that here we are ignoring index 255, but I am not sure where 255 comes from.

Thanks! Any input is highly appreciated!

NicolasHug commented 2 years ago

@datumbox will be able to correct me if I'm wrong here:

Since the pretraining classes is 21, should the target mask be integers from 0 to 21, where 0 is the background?

Yes.

do we need to compute loss over 0 (background)?

I believe so - the ability to detect background over other classes is a required feature of the models, so it has to be explicitly encoded in some way.

I see that here we are ignoring index 255, but I am not sure where 255 comes from.

I believe it comes from here: https://github.com/pytorch/vision/blob/6d85d74be6ac72b2ac3057d85d8ae0004fa0de3f/references/segmentation/coco_utils.py#L55-L56