Open do-me opened 3 weeks ago
You can specify model_file_name
as one of the options in .from_pretrained(model_id, { model_file_name: 'model' }
:)
Although, do note that the weights I uploaded only work for Transformers.js v3 (unless you manually override the onnxruntime-web/node version to >= 1.16.0).
See the README for example Transformers.js code:
import { AutoTokenizer, CLIPTextModelWithProjection, AutoProcessor, CLIPVisionModelWithProjection, RawImage, cos_sim } from '@xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('jinaai/jina-clip-v1');
const text_model = await CLIPTextModelWithProjection.from_pretrained('jinaai/jina-clip-v1');
// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch32');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('jinaai/jina-clip-v1');
// Run tokenization
const texts = ['A blue cat', 'A red cat'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Compute text embeddings
const { text_embeds } = await text_model(text_inputs);
// Read images and run processor
const urls = [
'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
];
const image = await Promise.all(urls.map(url => RawImage.read(url)));
const image_inputs = await processor(image);
// Compute vision embeddings
const { image_embeds } = await vision_model(image_inputs);
// Compute similarities
console.log(cos_sim(text_embeds[0].data, text_embeds[1].data)) // text embedding similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[0].data, image_embeds[1].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[0].data)) // text-image cross-modal similarity
console.log(cos_sim(text_embeds[1].data, image_embeds[1].data)) // text-image cross-modal similarity
Model description
jinaai/jina-clip-v1
Prerequisites
Additional information
You just added the onnx files to their HF repo, that's great! 🥳
Now that model files are getting more complex and have a prefix like
text_
orvision_
(or evenaudio_
in the future), transformers.js needs an update as it doesn't support loading files other thanmodel.onnx
ormodel_quantized.onnx
if see it correctly. You'll get this kind of error atm with 17.2 as it cannot locate the files with above prefixes:You're probably already working on this, but I still though it might be useful to have it documented here for anyone else looking for support.
Or is there already another way to specify the name?
Your contribution
I can gladly test!