Question on application

sehyunkwon / ICTC

This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)

Apache License 2.0

76 stars 3 forks source link

Hello!

IC|TC can accommodate any criteria expressible in text, so using verbs is also possible. Regarding the number of data, I cannot guarantee a specific amount, but I believe that 1000 images should be sufficient.
It would be possible if we finetuned the VLM and LLM to understand embeddings. Alternatively, converting text criteria (i.e., verbs) into embeddings and then effectively interacting these with the image embeddings for clustering is also possible. This could be an interesting future work.

Thanks for your questions!

sehyunkwon / ICTC