patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
MIT License
327 stars 36 forks source link

Output embeddings dimensions #34

Closed alvaro-stylesage closed 3 months ago

alvaro-stylesage commented 3 months ago

When loading the FashionCLIP model from HF using only the image encoder, like this:

fashion_clip = 'patrickjohncyh/fashion-clip'

model = CLIPVisionModel.from_pretrained(fashion_clip)
processor = CLIPProcessor.from_pretrained(fashion_clip)

image = Image.open("image1.jpg")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
numpy_outputs = outputs.last_hidden_state.numpy()

This outputs an array of shape (1, 50, 768).

Two concerns here:

Thanks

vinid commented 3 months ago

Hello! Sorry for the late reply, I was just able to test this now.

I get the same result from laion/CLIP-ViT-B-32-laion2B-s34B-b79K you might want to try to ask the question the LAION folks!