Clarification on precomputing the visual embeddings

kohjingyu / gill

🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".

https://jykoh.com/gill

Apache License 2.0

433 stars 38 forks source link

Clarification on precomputing the visual embeddings #25

Closed MiladMt11 closed 1 year ago

MiladMt11 commented 1 year ago

I want to know about the precomputed visual embeddings that you are providing here, how did you exactly compute these embeddings? maybe just using the same SD model on CC3M data?

Thanks in advance

kohjingyu commented 1 year ago

Hi, I realized this script wasn't added before, so here it is now: https://github.com/kohjingyu/gill/blob/main/scripts/extract_img_embs.py

This is not using SD, as these embeddings are for retrieval. Hope it helps!