tensorflow / similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Apache License 2.0
1.01k stars 104 forks source link

Request for Sampler analogue like `tf.keras.utils.image_dataset_from_directory` #302

Closed HandcartCactus closed 1 year ago

HandcartCactus commented 2 years ago

tf keras.image_dataset_from_directory takes a directory structured like this:

main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg

and returns a Dataset object. I have a similar dataset and I'd love to be able to use a Sampler on it.

I've already got most of my own implementation but I'm working on optimizing some of the code, as I am running into memory usage vs disk read speed tradeoff concerns. So with permission, I'd love to also claim this issue so I can contribute what I've written, and maybe get someone more experienced to look it over.

owenvallis commented 1 year ago

Hi Ejjaffe,

PR #307 from AminHP just added something similar to this as a MultiShotFileSampler. This uses the same base in memory MultiShotMemorySampler bit accepts a custom function for loading the images. Here we assume that your x examples are paths to the images and that the loading function will read and prepare the images.

NOTE: this may slow down batch generation depending on how much preprocessing you do within each call to load fn.

See here for the new sampler