I'm exploring CLIP for similar product retrieval by combining a product's description and image as input. As I understand, CLIP excels at image-to-text and text-to-image retrieval tasks, but I'm curious about its capability to handle integrated text and image inputs. Is this possible with CLIP and does anyone have examples?
I'm exploring CLIP for similar product retrieval by combining a product's description and image as input. As I understand, CLIP excels at image-to-text and text-to-image retrieval tasks, but I'm curious about its capability to handle integrated text and image inputs. Is this possible with CLIP and does anyone have examples?