Open loretoparisi opened 8 months ago
I agree! :) Perhaps a community member is interested in creating one?
hmm I searched the list of supported models and didn't see ColBERT - is there a code example?
We have exported colbert-v2 to ONNX at https://huggingface.co/Xenova/colbertv2.0, and you can see the model card for example usage:
import { pipeline } from '@xenova/transformers';
// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/colbertv2.0');
// Compute sentence embeddings
const sentences = ['Hello world', 'This is a sentence'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output);
// Tensor {
// dims: [ 2, 768 ],
// type: 'float32',
// data: Float32Array(768) [ -0.008133978582918644, 0.00663341861218214, ... ],
// size: 768
// }
You can convert this Tensor to a nested JavaScript array using .tolist()
:
console.log(output.tolist());
// [
// [ -0.008133978582918644, 0.00663341861218214, 0.06555338203907013, ... ],
// [ -0.02630571834743023, 0.011146597564220428, 0.008737687021493912, ... ]
// ]
oh thank you.
I've used feature-extraction
before. If it just outputs an array of numbers - how do they do the highlights?
I believe that's with their MaxSim operator, and you can find more information about it in their paper.
Transformers.js only handles the first part (generating the query + document token embeddings - i.e., the boxes coming out of f_Q
and f_D
)
oh I see, I misunderstood the OP, I thought transformers.js has this functionality and is just missing the nice UI.
I'll look into how they accomplish it, thank you
Apologies if this is a silly question, but when I run the feature extractor with no pooling I get 768 dimensions per token. I thought that ColBERT 2 only produced 128 dimensions per token.
Is there a parameter I am missing or something else I don't understand?
It would be worth to provide an example of using ColBERT passage retrieval from a user query as browser execution. A good example has been provide here for query-passage score interpretability
Specifically the Contextualised Highlights gives an important overview of the inner scoring at the token level, as well as the resulting MaxSim score.
Motivation. Recently I was able to benchmark ColBERT for WASM cpu execution vs. WebGPU thanks to Xenova playground here, ColBERT (with quantization eventually) performances in the browser are efficient enough to perform passage retrieval locally with WebGPU support and eventually fallback to cpu.