Train vs Validation sets

Dear @josueortc I am posting my Email response here so that others might see it as well:

Depending on your use case for 16-class-ImageNet, different approaches for train-test-splits make sense. For the generalisation results reported in Figure 4, we wanted to make sure that CNNs (given their poor generalisation performance) were not at a disadvantage, e.g. because they did not see enough samples. Since 16-class-ImageNet is already subset of ImageNet, we decided not to futher reduce the training set by leaving out a test set (which would generally be the ideal approach, but then one might argue that low CNN accuracies might also be caused by fewer samples, which we wanted to avoid here). Therefore, our generalisation results for Figure 4 could be seen as an upper limit to CNN generalisation performance (they might be even a bit worse if tested on an independent test set). If you are interested in something different than the upper limit, splitting the data would certainly make sense, as long as you can make sure that you do not run into troubles with too little training data. In a bit more detail: for Figure 4, CNNs were trained on the entire 16-class-ImageNet using sample weighting as explained in section 2.4 and then tested on a random subset of this dataset (test set balanced w.r.t. the 16 classes). If you would like to reproduce our exact test results, the image names are contained in the raw-data/fine-tuning/ directory such as this examplary file for the highpass-experiment https://github.com/rgeirhos/generalisation-humans-DNNs/blob/master/raw-data/fine-tuning/highpass-experiment/highpass-experiment_all-noise_session_1.csv . The suffix of the data column titled 'imagename', such as n03041632_8618.png, indicates the ImageNet image used for testing.

I am closing this issue now; please feel free to re-open if there is still anything unclear! ;)

rgeirhos / generalisation-humans-DNNs

Train vs Validation sets #1