Closed robertjoellewis closed 1 year ago
Hello!
Thanks for asking! yes, the model is an HF model, so you can use the standard API used for CLIP (here's the HF Documentation).
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("patrickjohncyh/fashion-clip")
processor = CLIPProcessor.from_pretrained("patrickjohncyh/fashion-clip")
image = Image.open("images/image1.jpg")
inputs = processor(text=["a photo of a red shoe", "a photo of a black shoe"],
images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)
print(probs)
image.resize((224, 224))
Let me know if this makes sense!
Is there demo code for using the model via the hugging face API?