Some issues about the reproduction

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

https://aka.ms/llm2clip

MIT License

335 stars 15 forks source link

Some issues about the reproduction #12

Open forg77 opened 2 weeks ago

forg77 commented 2 weeks ago

Hello!

I am very interested in your work, and I encountered some issues during the reproduction process.

How can I replace the original text encoder with the tuned Llama 3 model? I checked the config file LLM2CLIP-EVA02-L-14-336/configuration_evaclip.py, and I noticed that the model parameters for the text encoder remain the same as those in the original CLIP model. This is a bit confusing to me.
If I’m correct, is the run.sh script provided for training CLIP with a frozen Llama 3 encoder?

Looking forward for your reply!

Yif-Yang commented 2 weeks ago

We will response to you after CVPR ddl, thanks to your attention~

Divyanshupy commented 1 week ago

I had the same question. I was wondering if access to the LLM text encoder would be possible. Great work !

Yif-Yang commented 1 week ago

@Divyanshupy @forg77 We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference.

Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out.

Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively.

Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned!

chaewon-huh commented 1 week ago

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

Yif-Yang commented 1 week ago

Thank you for the updates and for making the fine-tuned Llama3-8B-CC model available! I’m really looking forward to trying it out and exploring the improvements in retrieval performance.

I was wondering, do you have any plans to release a fine-tuned version of a smaller text encoder, such as Llama 1B? It would be incredibly helpful for experimentation in environments with limited computational resources.

Thanks again for your great work and ongoing support!

Thanks for your support. I think we will try release all our text model we tried including llama3.2 1B in within this week.