laihaoran / CARZero

Apache License 2.0
24 stars 3 forks source link

Zero-shot inference #1

Open Shawie66 opened 7 months ago

Shawie66 commented 7 months ago

Hello! I just want to know that the number of images (I) and text (T) is usually the same when the model is trained, but during the zero-shot inference stage, the number of input images and text is likely to be inconsistent. How did you solve it? Thanks!

laihaoran commented 7 months ago

During the zero-shot inference stage, the number of input images and texts is typically inconsistent. However, we can directly calculate the similarity between images and texts on a one-to-one basis, which avoids any confusion.