Closed monaj07 closed 7 years ago
The pascal voc dataset provide RGB
label
pre_encoded
folder will be created for saving labelid
label
@zhxt0 i still don't get it. ELI5? :see_no_evil:
There are already class segmented files inside in the VOC from 2012. They are correct produced by hand (pixel-wise) and insisde it's defined folder for it (VOC2012/SegmentationClass/).
Do i assume correctly that SDB code and config in here as well as pre_encoded are only needed if one wants to create the segmented images using a model?
Maybe i just got lost :ghost: Anyway, would appreciate your answer!
The files in VOC2012/SegmentationClass/
are RBG images, with each color (R,G,B triplet) corresponding to a class. They are not single channel classwise ground truth images with each pixel having value as the index of the class present at that pixel.
The idea behind having a pre_encoded
directory is to avoid this mapping from RGB to classwise (single channel ground truth) images during training as it is computationally expensive.
@meetshah1995 thank you for your answer! But i have the images already as single channel containing only the colors which represent a class. Just skip the voc loader (convert part) and move on to train.py?
@meetshah1995 I found that train_aug = pascal_train_list + sbd_train_list and len(train_aug) == 9962. But when I came to pre_encoded directory, there is 9733 png files. What has I missed?
The splits across the pascal VOC 2012 and SBD datasets are:
voc_train: 1464 images
voc_val: 1449 images
sbd_train: 8498 images
sbd_val: 2857 images
It can be a bit confusing because both voc_train
and voc_val
have some overlap with sbd_train
. It is made more confusing by the fact that different research papers use different combinations of the data. For example, CRF-as-RNN uses 11,685 training images (the images from voc_train
+ sbd_train
+ sbd_val
), and only uses the images from voc_val
that do not occur in either sbd_train
or sbd_val
(leaving 346 images in total) for validation.
The original FCN paper used the 2011 VOC training and validation and data (without SBD). However, the updated FCN PAMI version included experiments that used voc_train
+ sbd_train
for training and used the images from voc_val
that were not in SBD train for validation (736 in total since they were using VOC 2011
- if VOC 2012
is used, as it is in this repo, this split has 904 images). See the footnote on page 7 of the paper for more details.
As a result of these differences, there are several ways to select the data splits (see e.g. this implementation for some commonly used setups).
How much difference does the extra data make? There is a helpful ablation study in the FCN PAMI paper that shows that moving from voc_train
(2011) to voc_train + SBD_train
with FCN-32s improved the validation score from 57.7 mIoU
to 63.6 mIoU
.
The number of masks (9733 in total) in the pre_encoded
directory is the number of unique images across both the train_aug
and val
sets. I.e. inside the pascalVOCLoader
class you should find that:
len(np.unique(self.files['train_aug'] + self.files['val'])) # gives 9733
This repo follows the data splits described in the FCN PAMI paper. However, in the current implementation, (as far as I understand it), some of the training images are repeated in the train_aug
split. I'll submit a PR for that.
Could you please give me a hint what the
pre_encoded
folder/option inpascal_voc_loader.py
refers to? I do not have that folder or any.mat
file that the loader wants to read from it. Thanks.