tensorflow / similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Apache License 2.0
1.01k stars 104 forks source link

[FEATURE REQUEST] use of dataset in tfsim.callbacks.EvalCallback #293

Open Lunatik00 opened 2 years ago

Lunatik00 commented 2 years ago

Hi, I have a relatively big dataset, considering the available ram, I currently have access to machines that I can use with the dataset, so that is not a problem for me, but since the ram use is a lot I checked if there was an implementation to use a dataset (tf.data.Dataset(), the same way it can be an input for the model.fit() function) and it wasn't, it could help people with less compute resources to use this function with their datasets (I read the dataset using the function tf.keras.utils.image_dataset_from_directory(), it can be batched or unbatched)

owenvallis commented 2 years ago

So we do provide the tfrecord sampler for handling datasets that are too large to fit in memory. There are some quirks to setting up the TFRecords, i.e., this sampler requires that each TF Record file contain contiguous blocks of classes where the size of each block is a multiple of example_per_class.

Regarding the EvalCallback. This was meant to hold a smaller subset of the data in memory as we need to rebuild the index every time we call the Callback. Since this is pretty expensive, the expectation is that this is small eval set.