Open bakachan19 opened 1 month ago
@bakachan19 Thank you for your interest in our work. The embedding composite operation in Figure 4.C is a simple arithmetic addition and subtraction in the representation space. We encode all the raw data into a unified space using OmniBind, then directly perform the addition and subtraction. The result of the operation (also an embedding) is used as a query in a larger gallery for cosine similarity matching, ultimately retrieving the result.
Dear @zhang-ziang, Thank you so much for your reply. Just a quick question: suppose you have image_embedding and text_embedding extracted with OmniBind. After I perform the arithmetic, i.e. result = image_embedding - text_embedding do I need to normalize the result before using it as a query for image retrieval?
Dear @zhang-ziang, Thank you so much for your reply. Just a quick question: suppose you have image_embedding and text_embedding extracted with OmniBind. After I perform the arithmetic, i.e. result = image_embedding - text_embedding do I need to normalize the result before using it as a query for image retrieval?
Yes, sure, all the representations we discuss should be on the hypersphere. :)
hi, thank you for this great work! I was wondering if you could share the code (or provide some guidance) to reproduce the results in Figure 4.c about the composable understand and how to perform embedding arithmetic.
thanks.