Comparing image similiartiy between ImageBind and CLIP
ImageBindCLIP
ImageBind exhibits a stronger cosine similarity than CLIP.
Comparing zero-shot classification results between ImageBind and CLIP
ImageBindCLIP
In a image of both horse and bus, unlike CLIP, ImageBind shows a significantly higher probability for 'bus'.
ImageBind successfully classify cat image with a low probability. However, this is of the use of CenterCrop in the preprocessing step in CLIP. (ImageBind just uses Resize)
I made image similartiy demo code with ImageBind.
Comparing image similiartiy between ImageBind and CLIP
ImageBind CLIP
Comparing zero-shot classification results between ImageBind and CLIP
ImageBind CLIP
CenterCrop
in the preprocessing step in CLIP. (ImageBind just usesResize
)ImageBind similarity demo code: demo.py