Closed daavoo closed 2 years ago
@daavoo does it help to see outliers or there are other cases?
also, does it cluster all the images in the dataset?
@daavoo does it help to see outliers or there are other cases?
It all depends on:
Use cases and the clusters to appear will vary depending on that.
I have used embeddings from a pre-trained model (CLIP) in this example.
This is useful to get an intuition of whether the task is feasible to address by fine-tuning, by checking if clusters appear easily and clearly separate labels like it are for this case (enabled option to color by label):
It can also allow the detection of potential clusters of interest (or even outliers). In the example above one of the small clusters only contains blueberry muffins (and an outlier with only blueberries but no muffins):
also, does it cluster all the images in the dataset?
The utility script I added takes the whole instantiated ldb dataset and adds all of them to the viewer, so yes all images in the folder will be used.
I would say that the viewer starts to be impractical to navigate when there are more than 1 thousand images.
Isolated version of https://github.com/tensorflow/embedding-projector-standalone. Include utility script to go from ldb dataset in pairs format to the files expected by the projector.
Usual workflow should be to add some embedding with
--apply
+ running thedataset_to_projector
script: