kohjingyu / fromage

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
https://jykoh.com/fromage
Apache License 2.0
466 stars 34 forks source link

What is CC3M Embeddings #27

Closed ziqipang closed 10 months ago

ziqipang commented 10 months ago

Thank you for your great work and help previously!

I am trying to run the evaluation of evals/eval_vist_retrieval.py and noticed that its prerequisite is having the cc3m embeddings prepared. I am wondering: (1) what's the function of this embedding file, and, (2) if I finetune the Fromage model differently, do I need to generate a new embedding file, and if so, could you please release the code for this part?

Thank you again for your time and attention! Greatly appreciate it!

kohjingyu commented 10 months ago

what's the function of this embedding file

These are precomputed embeddings for a list of URLs from CC3M.

if I finetune the Fromage model differently, do I need to generate a new embedding file, and if so, could you please release the code for this part

You don't necessarily need to if you just want to run the evals, since the retrieval is done over VIST images and not CC3M. You can bypass this by using just the same cc3m file we provide, it should not affect the eval scripts.

This is mostly just for qualitative results (in the inference notebook, the output images are retrieved from the candidate images in the cc3m_embeddings.pkl file). If you want to do so for your new model, I've uploaded the (untested) script here. You will need to replace the list of URLs in that file with the URLs that you want to retrieve images from.

ziqipang commented 10 months ago

@kohjingyu Thank you! It is nice to know that these are similar to place holders and won't affect the evaluation. I will check out the code and update you if any issues occur.