Closed Sparkle-Q closed 10 months ago
Hi, @Sparkle-Q ,
Sorry for the late response. We use the CLIP cosine similarity score between the input image and the training sentence pool to retrieve the semantic prior. You can further check this repo for more reference.
Best, Jianjie
Sorry for the late response. We use the CLIP cosine similarity score between the input image and the training sentence pool to retrieve the semantic prior. You can further check this repo for more reference.
Hello, may I ask how the training sentence pool is obtained? Where can I find the code? Thank you very much.
I didn't find the process that search the semantically relevant sentence from training sentence pool by using an off-the-shelf cross-modal retrieval modal, which is mentioned in the paper. Could you please show me how to do this process in the code?