Closed animemes-bot closed 2 years ago
On what dataset are you doing this? You have to be aware that we only trained on 15M image-text pairs, which is not at the scale OpenAI trained their CLIP models.
Due to no further response, i close this issue.
So I implemented a text-to-image search where I query a text - text encoder then image through the image encoder and retrieve top images for the query but it doesn't work well with CLOOB?
What is the main reasoning behind this?