Closed vkrishnamurthy11 closed 3 months ago
Hi, we used the OpenAI official ViT-L. OpenCLIP is used only for H and G (LAION version, not datacomp version) Please check the following code for the detailed model version https://github.com/navervision/lincir/blob/250dba25b634c3a5311c2a8bc302d63e71ccd607/models.py#L13-L16
You can check the model detail in HuggingFace model hub (e.g., https://huggingface.co/openai/clip-vit-large-patch14)
ok thanks!
also do you have an updated PHI checkpoint trained using the CLIP Large model? Or is "lincir_large.pt" the best checkpoint?
lincir_large.pt
and pic2word_large.pt
checkpoints are for the retrieval demo. As far as I know, they are not the same setting as the models reported in the paper.
If you need a phi model for a fair comparison, I would like to recommend you to train your own phi model.
Can we get the link to the models as reported in the paper especially for the CLIP Large variant?
At this time, we do not have any plans to release the model.
Thank you.
Which CLIP variant is the model "lincir_large.pt" trained on? I assume it is the ViT-L-14 variant but not sure if the pretrained is OpenAI or datacomp.