meetps / pytorch-semseg

Semantic Segmentation Architectures Implemented in PyTorch
https://meetshah.dev/semantic-segmentation/deep-learning/pytorch/visdom/2017/06/01/semantic-segmentation-over-the-years.html
MIT License
3.38k stars 799 forks source link

"pre_encoded" directory in Pascal dataset #26

Closed monaj07 closed 7 years ago

monaj07 commented 7 years ago

Could you please give me a hint what the pre_encoded folder/option in pascal_voc_loader.py refers to? I do not have that folder or any .mat file that the loader wants to read from it. Thanks.

jetxa commented 7 years ago

The pascal voc dataset provide RGB label pre_encoded folder will be created for saving labelid label

4F2E4A2E commented 6 years ago

@zhxt0 i still don't get it. ELI5? :see_no_evil:

There are already class segmented files inside in the VOC from 2012. They are correct produced by hand (pixel-wise) and insisde it's defined folder for it (VOC2012/SegmentationClass/).

Do i assume correctly that SDB code and config in here as well as pre_encoded are only needed if one wants to create the segmented images using a model?

Maybe i just got lost :ghost: Anyway, would appreciate your answer!

meetps commented 6 years ago

The files in VOC2012/SegmentationClass/ are RBG images, with each color (R,G,B triplet) corresponding to a class. They are not single channel classwise ground truth images with each pixel having value as the index of the class present at that pixel.

The idea behind having a pre_encoded directory is to avoid this mapping from RGB to classwise (single channel ground truth) images during training as it is computationally expensive.

4F2E4A2E commented 6 years ago

@meetshah1995 thank you for your answer! But i have the images already as single channel containing only the colors which represent a class. Just skip the voc loader (convert part) and move on to train.py?

ksnzh commented 6 years ago

@meetshah1995 I found that train_aug = pascal_train_list + sbd_train_list and len(train_aug) == 9962. But when I came to pre_encoded directory, there is 9733 png files. What has I missed?

albanie commented 6 years ago

The splits across the pascal VOC 2012 and SBD datasets are:

    voc_train: 1464 images
    voc_val: 1449 images
    sbd_train: 8498 images
    sbd_val: 2857 images

It can be a bit confusing because both voc_train and voc_val have some overlap with sbd_train. It is made more confusing by the fact that different research papers use different combinations of the data. For example, CRF-as-RNN uses 11,685 training images (the images from voc_train + sbd_train + sbd_val), and only uses the images from voc_val that do not occur in either sbd_train or sbd_val (leaving 346 images in total) for validation.

The original FCN paper used the 2011 VOC training and validation and data (without SBD). However, the updated FCN PAMI version included experiments that used voc_train + sbd_train for training and used the images from voc_val that were not in SBD train for validation (736 in total since they were using VOC 2011 - if VOC 2012 is used, as it is in this repo, this split has 904 images). See the footnote on page 7 of the paper for more details.

As a result of these differences, there are several ways to select the data splits (see e.g. this implementation for some commonly used setups).

How much difference does the extra data make? There is a helpful ablation study in the FCN PAMI paper that shows that moving from voc_train (2011) to voc_train + SBD_train with FCN-32s improved the validation score from 57.7 mIoU to 63.6 mIoU.

The number of masks (9733 in total) in the pre_encoded directory is the number of unique images across both the train_aug and val sets. I.e. inside the pascalVOCLoader class you should find that:

len(np.unique(self.files['train_aug'] + self.files['val'])) # gives 9733

This repo follows the data splits described in the FCN PAMI paper. However, in the current implementation, (as far as I understand it), some of the training images are repeated in the train_aug split. I'll submit a PR for that.