Closed jkbdnj closed 1 week ago
NOTE: The way to do it with Keras would be in the case of the training subset:
# returns tf.data.Dataset object
training_dataset_color = tf.keras.utils.image_dataset_from_directory(
"PATHO_TO/initial_dataset/color",
seed=1234,
validation_split=0.2,
subset="training",
batch_size = 32)
training_dataset_segmented = tf.keras.utils.image_dataset_from_directory(
"PATHO_TO/initial_dataset/segmented",
seed=1234,
validation_split=0.2,
subset="training",
batch_size = 32)
training_dataset_augmented_backgrounds = tf.keras.utils.image_dataset_from_directory(
"PATHO_TO/initial_dataset/augmented_backgrounds",
seed=1234,
validation_split=0.2,
subset="training",
batch_size = 32)
# concatenates the batches
training_dataset = training_dataset_color.concatenate(training_dataset_segmented).concatenate(training_dataset_artificial_background)
Task description
It is necessary to create a script that divides the initial dataset into trainig and testing subsets before the training process. The current initial dataset has the following structure:
The target strucutre of the initial dataset is:
With such a preprocessed dataset, it is not always needed to divide and reorder the dataset prior to the training process. It is true, that the Keras has pretty efficient dataset loading functionality. But with a divided dataset the work is much easier. The most important thing is that the ratio of coloar/segmented/artificial_background images is maintained in every class of training and testing subsets.