Does ImageDataGenerator randomly select validation images?

ZheMann commented 4 years ago

When setting parameter validation_split to a value larger than 0.0, how does the Keras ImageDataGeneratorselect the validation images? Are they randomly selected from the input directory, or are the last n samples used, similar to the validation_split parameter for model.fit? More specifically, I'm primarily interested in the following situation: considering the flow_from_directory method, a shuffle parameter is available to randomize the data. However, is the shuffle applied after the input directory is splitted into a train and validation set by the ImageDataGenerator, or before?

I went through the official Keras and TF pages but they both show the same explanation of validation_split, namely:

validation_split: Float. Fraction of images reserved for validation (strictly between 0 and 1).

I also went through the source code (both Keras and TF) without any luck of finding additional information.

Dref360 commented 4 years ago

This is quite hidden into the code base, but in the case of flow_from_directory, it is a percentage per directory.

https://github.com/keras-team/keras-preprocessing/blob/daadd519238b19cdd8ffc962a99b25ee40368462/keras_preprocessing/image/utils.py#L195

QoT commented 2 years ago

Not sure why is that not visible in @Dref360s answer, but important part is last sentence:

split: tuple of floats (e.g. (0.2, 0.6)) to only take into account a certain fraction of files in each directory. E.g.: segment=(0.6, 1.0) would only account for last 40 percent of images in each directory.

Actually, files are Python sorted() and if you format image names properly, you could use this feature pretty easy. Otherwise you might get something like this:

_image_0.jpg image_1.jpg image_10.jpg image_100.jpg image_1000.jpg image_1001.jpg image_1002.jpg image_1003.jpg image_1004.jpg image_1005.jpg image_1006.jpg image_1007.jpg image_1008.jpg image_1009.jpg image_101.jpg image_1010.jpg image_1011.jpg image_1012.jpg image_1013.jpg image_1014.jpg image_1015.jpg image_1016.jpg image_1017.jpg image_1018.jpg image1019.jpg

keras-team / keras-preprocessing

Does ImageDataGenerator randomly select validation images? #290