xu-ji / IIC

Invariant Information Clustering for Unsupervised Image Classification and Segmentation
MIT License
865 stars 207 forks source link

About data splits/partitions (train/test) #46

Closed kanthagirish-rit closed 4 years ago

kanthagirish-rit commented 4 years ago

Hi Xu,

A well-written paper! Thanks for the code as well.

I am trying to perform timing analysis by loading a pretrianed segmentation model. I have the following question W.R.T dataloader.

In the function segmentation_create_dataloaders(config), the train and test partitions use all the data (train/test/validation) for mode == 'IID'. IID seems to be the required mode by the code. Does this mean entire data was used for training?

Thanks, Kantha Girish

xu-ji commented 4 years ago

Yes. Page 6 of paper:

For unsupervised clustering, following previous work [8, 51, 52], we train on the full dataset and test on the labelled part; for the semi-supervised settings, train and test sets are separate.

primecai commented 4 years ago

Hello Xu,

If I understood correctly, for datasets like CIFAR10 the model will both train and test on the full dataset(because they are all labelled)? I noticed that all partitions were set to load [True, false].

Many thanks for the work again!

xu-ji commented 4 years ago

Yes, for fully unsupervised learning the model is trained on the full dataset, and tested on the part that has labels (otherwise it is unknown whether a prediction is correct), which is the full dataset in the case of CIFAR10.