Evaluation for STL10 - Githubissues

vipinpillai commented 5 years ago

Hi, Thanks for sharing your work.

I had a question regarding the input size used for evaluation for STL10. Looking at the code in IID_semisup_STL10.py (698), the test data uses TenCrop evaluation with input image size = old_config.input_sz. This is 64x64 from the command used in 650. Could you please confirm if the numbers reported in Table 3 are all using 64x64 as the input image size with TenCrop evaluation, including the supervised baseline for Cutout networks?

xu-ji commented 5 years ago

The reported results from other works in table 3 vary in specific details of their evaluations (as with the architectures used, learning rates etc). There are no material differences except that some use multiple folds, this is detailed in the table. From the Cutout paper, it seems their inputs are 96x96, with effective size 84x84 as they use size 12 padding, and TenCrop was not used.

edouardoyallon commented 5 years ago

hi! Thanks for sharing this great work! I'm interested by the limited labeled data applications, and I was wondering what were the (best) performances of this algorithm, when running the standard evaluation with 10 predefined folds of size 1000 for training, and not the full dataset of size 5000(which, to my knowledge, is not standard)? Thank you very much!

xu-ji commented 5 years ago

Hi, the full dataset is a standard setting. All 15 baseline methods in table 1 use it, and 13 of these are other people's experiments. Note in particular DAC (table 3 in this, see their code for STL10 here) and ADC (section 3.5 in this). For semi and fully supervised, the convolutional clustering and cutout papers also use the full dataset setting (table 3). Also, interesting that the PyTorch interface for STL10 doesn't support the multi-fold setting.

For applications with low amounts of labelled data, recall that unsupervised IIC needs 0 labels. So you would definitely try this. In addition, we explicitly test how semi-supervised learning with small amounts of labels (50, 500, 1250 etc.) work with STL-10. The results are that with just 500 labels, ~90% of the accuracy with using all 5000 labels is achieved. As expected, since the unsupervised learning does most of the work. See left graph in fig. 6, "Semi-supervised learning analysis" in section 4.1, and top of table 4 in the supplementary.

edouardoyallon commented 5 years ago

Hi! Thanks for your answer.

I should have specified I'm interested in the setting of Table 3 and not the clustering evaluation. The two papers you mention indeed use this protocol, but this is not the standard for this setting https://cs.stanford.edu/~acoates/stl10/ and the methods SWWAE, Dosovitskiy you compare in Table 3 are using 1k labeled samples, not 5k. The difference can be quite large. Our own work https://arxiv.org/abs/1703.08961 reports both setups and the difference is around 11% with the full data setting getting 87.6%.

Also, interesting that the PyTorch interface for STL10 doesn't support the multi-fold setting.

Not since today! This was an old known issue.

xu-ji commented 5 years ago

Ok, cool. I'll try to run some more experiments and get back to you.

xu-ji commented 5 years ago

@edouardoyallon I've run it on the first 6 folds and the average so far is 79.1%.

Will give the full number when done, but expect it to be similar.

edouardoyallon commented 5 years ago

hi @xu-ji this is great, what's the final accuracy?! Thanks so much, I'll use the future number.

xu-ji commented 5 years ago

@edouardoyallon the results:

Average: 0.7921625 All: [0.795375, 0.792750, 0.789500, 0.787875, 0.794750, 0.790625, 0.796125, 0.790000, 0.799000, 0.785625] Std dev: 0.0039378174475209

xu-ji / IIC

Evaluation for STL10 #7