zehanwang01 / OmniBind

25 stars 1 forks source link

embedding arithmetic #3

Open bakachan19 opened 1 month ago

bakachan19 commented 1 month ago

hi, thank you for this great work! I was wondering if you could share the code (or provide some guidance) to reproduce the results in Figure 4.c about the composable understand and how to perform embedding arithmetic.

thanks.

zhang-ziang commented 1 month ago

@bakachan19 Thank you for your interest in our work. The embedding composite operation in Figure 4.C is a simple arithmetic addition and subtraction in the representation space. We encode all the raw data into a unified space using OmniBind, then directly perform the addition and subtraction. The result of the operation (also an embedding) is used as a query in a larger gallery for cosine similarity matching, ultimately retrieving the result.

bakachan19 commented 1 month ago

Dear @zhang-ziang, Thank you so much for your reply. Just a quick question: suppose you have image_embedding and text_embedding extracted with OmniBind. After I perform the arithmetic, i.e. result = image_embedding - text_embedding do I need to normalize the result before using it as a query for image retrieval?

zhang-ziang commented 1 week ago

Dear @zhang-ziang, Thank you so much for your reply. Just a quick question: suppose you have image_embedding and text_embedding extracted with OmniBind. After I perform the arithmetic, i.e. result = image_embedding - text_embedding do I need to normalize the result before using it as a query for image retrieval?

Yes, sure, all the representations we discuss should be on the hypersphere. :)