ColBERT Wasm Demo - Githubissues

xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

https://huggingface.co/docs/transformers.js

Apache License 2.0

11.18k stars 693 forks source link

ColBERT Wasm Demo #656

Open loretoparisi opened 6 months ago

loretoparisi commented 6 months ago

It would be worth to provide an example of using ColBERT passage retrieval from a user query as browser execution. A good example has been provide here for query-passage score interpretability

Screenshot 2024-03-21 at 10 22 01

Specifically the Contextualised Highlights gives an important overview of the inner scoring at the token level, as well as the resulting MaxSim score.

Motivation. Recently I was able to benchmark ColBERT for WASM cpu execution vs. WebGPU thanks to Xenova playground here, ColBERT (with quantization eventually) performances in the browser are efficient enough to perform passage retrieval locally with WebGPU support and eventually fallback to cpu.

xenova commented 5 months ago

I agree! :) Perhaps a community member is interested in creating one?

Madd0g commented 3 months ago

hmm I searched the list of supported models and didn't see ColBERT - is there a code example?

xenova commented 3 months ago

We have exported colbert-v2 to ONNX at https://huggingface.co/Xenova/colbertv2.0, and you can see the model card for example usage:

import { pipeline } from '@xenova/transformers';

// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/colbertv2.0');

// Compute sentence embeddings
const sentences = ['Hello world', 'This is a sentence'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output);
// Tensor {
//   dims: [ 2, 768 ],
//   type: 'float32',
//   data: Float32Array(768) [ -0.008133978582918644, 0.00663341861218214, ... ],
//   size: 768
// }

You can convert this Tensor to a nested JavaScript array using .tolist():

console.log(output.tolist());
// [
//   [ -0.008133978582918644, 0.00663341861218214, 0.06555338203907013, ... ],
//   [ -0.02630571834743023, 0.011146597564220428, 0.008737687021493912, ... ]
// ]

Madd0g commented 3 months ago

oh thank you.

I've used feature-extraction before. If it just outputs an array of numbers - how do they do the highlights?

xenova commented 3 months ago

I believe that's with their MaxSim operator, and you can find more information about it in their paper.

Transformers.js only handles the first part (generating the query + document token embeddings - i.e., the boxes coming out of f_Q and f_D)

Madd0g commented 3 months ago

oh I see, I misunderstood the OP, I thought transformers.js has this functionality and is just missing the nice UI.

I'll look into how they accomplish it, thank you

philnash commented 1 month ago

Apologies if this is a silly question, but when I run the feature extractor with no pooling I get 768 dimensions per token. I thought that ColBERT 2 only produced 128 dimensions per token.

Is there a parameter I am missing or something else I don't understand?