how to infer text-img pair demo?

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

https://aka.ms/llm2clip

MIT License

81 stars 4 forks source link

how to infer text-img pair demo? #8

Open WinstonDeng opened 1 day ago

WinstonDeng commented 1 day ago

Using openai official text model, text embedding dim is 768, mismatching with llm2clip img embedding dim 1280.

text_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14-336")
inputs = tokenizer(text=texts, padding=True, return_tensors="pt").to(device)
text_features = text_model.get_text_features(**inputs) # [1, 768]

Yif-Yang commented 1 day ago

We will update readme about this today. Will let you know after we add this. Thanks for you reminding.

BIGBALLON commented 1 day ago

Same question here. btw, do you have a plan to release CC Finetuned LLM?

Yif-Yang commented 1 day ago

Same question here. btw, do you have a plan to release CC Finetuned LLM?

We’ll do our best to release it within 24 hours. Thank you for the reminder. If you have any other requests, feel free to let us know. We’re happy to release whatever we can, as long as it complies with safety regulations.