navervision / lincir

Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)
Other
100 stars 5 forks source link

Regarding model checkpoint #16

Closed vkrishnamurthy11 closed 3 months ago

vkrishnamurthy11 commented 3 months ago

Which CLIP variant is the model "lincir_large.pt" trained on? I assume it is the ViT-L-14 variant but not sure if the pretrained is OpenAI or datacomp.

SanghyukChun commented 3 months ago

Hi, we used the OpenAI official ViT-L. OpenCLIP is used only for H and G (LAION version, not datacomp version) Please check the following code for the detailed model version https://github.com/navervision/lincir/blob/250dba25b634c3a5311c2a8bc302d63e71ccd607/models.py#L13-L16

You can check the model detail in HuggingFace model hub (e.g., https://huggingface.co/openai/clip-vit-large-patch14)

vkrishnamurthy11 commented 3 months ago

ok thanks!

also do you have an updated PHI checkpoint trained using the CLIP Large model? Or is "lincir_large.pt" the best checkpoint?

SanghyukChun commented 3 months ago

lincir_large.pt and pic2word_large.pt checkpoints are for the retrieval demo. As far as I know, they are not the same setting as the models reported in the paper. If you need a phi model for a fair comparison, I would like to recommend you to train your own phi model.

vkrishnamurthy11 commented 3 months ago

Can we get the link to the models as reported in the paper especially for the CLIP Large variant?

geonm commented 3 months ago

At this time, we do not have any plans to release the model.

Thank you.