Hi,
I'm facing a composed image retrieval challenge over a large (~4M) image dataset.
It's distribution is different from the data CLIP was trained on (specific domain) so the first step presented in your paper is needed even more.
There is no public dataset for my data (specific tech gadgets) so I need to generate one, it is possible - but expensive.
Approximately, how much data do you think is needed? should it be it the triplet format presented in you paper (image, relative prompt, target image)?
Hi, I'm facing a composed image retrieval challenge over a large (~4M) image dataset. It's distribution is different from the data CLIP was trained on (specific domain) so the first step presented in your paper is needed even more.
There is no public dataset for my data (specific tech gadgets) so I need to generate one, it is possible - but expensive.
Approximately, how much data do you think is needed? should it be it the triplet format presented in you paper (image, relative prompt, target image)?
Thank you!