unum-cloud / uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
https://unum-cloud.github.io/uform/
Apache License 2.0
1.01k stars 60 forks source link

CoreML Model #7

Closed sandkoan closed 1 year ago

sandkoan commented 1 year ago

Have y'all experimented with exporting the uform model (which is fantastic, by the way) as a CoreML model, so it can be run on-device more efficiently?

kimihailv commented 1 year ago

Hello. We didn't try to export the uform model as CoreML model, but if you did it, we would love to see

sandkoan commented 1 year ago

Ahh, okay. Is there currently a way to batch embed these images/texts at once as opposed to in individual function calls?

kimihailv commented 1 year ago

Yes. First of all, you need to make the batch of preprocessed texts and images:

batch_texts = model.preprocess_text(['cat', 'dog'])
images = [Image.open('cat.jpg'), Image.open('dog.jpg')]
batch_images = []

for image in images:
    batch_images.append(model.preprocess_image(image))

batch_images = torch.stack(batch_images, dim=0)

Then, you can use encode_text and encode_image:

text_embeddings = model.encode_text(batch_texts) # 2 x embedding_dim
image_embeddings = model.encode_image(batch_images) # 2 x embedding_dim
ashvardanian commented 1 year ago

So preprocess_text can accept a list, and the preprocess_image can't? If so, @kimihailv should we change that?