Open TilakD opened 5 years ago
Hi @TilakD
It's pretty weird indeed, maybe this is because of your data? For instance maybe you have data coming from three different sources (ex: grayscale images, RGB images and another type), so the embeddings are naturally clustered by type before class.
It's also possible that t-SNE is not perfectly clustering the data? See this paper for more on tSNE: https://distill.pub/2016/misread-tsne/
I would plot the different images in each cluster for a single class to understand what differentiates them.
Hi @omoindrot Thanks for the reply.
All the data are coming from the same source (RGB images). 7 classes contains combination of 3 different subject images. 3 clusters for each class indicate 3 subjects.
When I check intra cluster distance in 128 dimension, I'm getting very low value for each class. When I do the same in 2D/3D after tsne, intra cluster distance in huge. I confused as to why tsne is considering features of subjects along with features of classes.
Please let me know your thoughts.
I'm not sure what your exact data is, but consider this (related?) example: you have 3 people, and you ask them to take 7 different poses (standing up, sitting...).
Now you train embeddings with triplet loss according to the 7 poses.
Of course the embeddings will also reflect the 3 different people you use, because by default their embeddings will be different. So even if you train perfectly with triplet loss, each cluster will likely contain 3 different sub-clusters.
Even in face recognition, the cluster of a person can contain clusters (one where the person wears glasses, one where the person is older...).
Thanks a lot @omoindrot. Got my doubts clarified!
Hi @omoindrot I am utilizing your foundation code on a custom dataset and I'm getting multiple clusters for same class when used tsne to visualize. My embeddings are 128 dimensions. Am I doing something wrong or there might be a single cluster for each class and when dimension is reduced it is moving into 3 different cluster??