huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.44k stars 711 forks source link

[Question] How to use transformers.js like the python sentence_transformers library? #150

Closed davidtbo closed 1 year ago

davidtbo commented 1 year ago

Hello all,

Thanks for this great library. I've just discovered it and I'm familiar with the python sentence_transformers module. I know from experience that sentence_transformers wraps a lot of the complexity compared to using transformers directly.

Can you point to an example of using this to replace python's sentence_transformers for semantic search document and question embedding? Does this solution handle the tokenization and attention windows automatically like sentence_transformers, or do I need to break my inputs into chunks, process them separately, and then mean pool them back together or something?

Thanks, Dave

xenova commented 1 year ago

Hi there! 👋

So, currently, we allow users to calculate embeddings with sentence-transformers models as follows:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

see docs for more info. We are currently working on improving how you can filter Transformers.js models on the hub, but for now, you'll need to visit https://huggingface.co/models?library=transformers.js and then look for the sentence-transformers model you want to use. You should be able to find it quite easily, since the name is the same (and just Xenova/ instead of sentence-transformers/).

The above code handles the tokenization, attention windows, mean pooling, and normalization for you, but you need to implement the additional utility functions they provide (like community detection and similarity computation). We provide some helper functions for things like cosine similarity, but you'll need to implement the rest.

davidtbo commented 1 year ago

I'm planning on using it with LanceDB, which handles the search functions. I just need to calculate the embeddings and stuff them in the database. Thanks for the quick response!

On Thu, Jun 15, 2023 at 8:42 AM Joshua Lochner @.***> wrote:

Hi there! 👋

So, currently, we allow users to calculate embeddings with sentence-transformers models as follows:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });console.log(result);// Tensor {// type: 'float32',// data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],// dims: [1, 384]// }

see docs https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.FeatureExtractionPipeline for more info. We are currently working on improving how you can filter Transformers.js models on the hub, but for now, you'll need to visit https://huggingface.co/models?library=transformers.js and then look for the sentence-transformers model you want to use. You should be able to find it quite easily, since the name is the same (and just Xenova/ instead of sentence-transformers/).

The above code handles the tokenization, attention windows, mean pooling, and normalization for you, but you need to implement the additional utility functions they provide (like community detection and similarity computation). We provide some helper functions https://huggingface.co/docs/transformers.js/api/utils/maths#module_utils/maths.cos_sim for things like cosine similarity, but you'll need to implement the rest.

— Reply to this email directly, view it on GitHub https://github.com/xenova/transformers.js/issues/150#issuecomment-1593314005, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADNSOFMQCOGVGAAL4RPHHMLXLMUNHANCNFSM6AAAAAAZIA6M2I . You are receiving this because you authored the thread.Message ID: @.***>

xenova commented 1 year ago

If doing something as simple as similarity detection, you can do something like:

import { pipeline, cos_sim } from '@xenova/transformers';

// Provide sentences
let sentences = [
    'This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.'
]

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let output = await extractor(sentences, { pooling: 'mean', normalize: true });

 // Convert Tensor to JS list
output = output.tolist();

let pairwiseScores = [[output[0], output[1]], [output[0], output[2]], [output[1], output[2]]].map(x => cos_sim(...x))
// [0.502872309810269, 0.11088411026413121, 0.09602621986931259]
xenova commented 1 year ago

I'm planning on using it with LanceDB, which handles the search functions. I just need to calculate the embeddings and stuff them in the database. Thanks for the quick response!

Okay great! Then that simplifies the rest :)

One last thing, if you're running server-side, then you can probably benefit from using the unquantized models (for higher accuracy). You can do that when loading the model:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', { quantized: true });

Feel free to ask any other questions, or close the issue if that's all you needed.

davidtbo commented 1 year ago

I'm talking about semantic search against web pages with small search queries with ranked results, not whole doc to similar whole docs or individual sentence to sentence.

I've found that the msmarco-bert-base-dot-v5 (and other msmarco models) are tuned for small queries against large docs, so that's not my concern. I just wanted to make sure that I could use this like I use python sentence_transformers without missing something that would break my results, since sentence_transformers does so much for you compared to using transformers directly. This being using transformers directly, I wanted to make sure I was finding code examples that handled attention and chunking and all that like sentence transformers does automatically, from my understanding.

I didn't want to unknowingly apply incomplete example code that did not fully mimic sentence_transformers, get sub-par results, and wonder why.

On Thu, Jun 15, 2023 at 8:45 AM Joshua Lochner @.***> wrote:

If doing something as simple as similarity detection, you can do something like:

import { pipeline, cos_sim } from @.***/transformers'; // Provide sentenceslet sentences = [ 'This framework generates embeddings for each input sentence', 'Sentences are passed as a list of string.', 'The quick brown fox jumps over the lazy dog.'] let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');let output = await extractor(sentences, { pooling: 'mean', normalize: true });

// Convert Tensor to JS listoutput = output.tolist(); let pairwiseScores = [[output[0], output[1]], [output[0], output[2]], [output[1], output[2]]].map(x => cos_sim(...x))// [0.502872309810269, 0.11088411026413121, 0.09602621986931259]

— Reply to this email directly, view it on GitHub https://github.com/xenova/transformers.js/issues/150#issuecomment-1593319677, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADNSOFMDLAR2J2LKO663PTLXLMUZ5ANCNFSM6AAAAAAZIA6M2I . You are receiving this because you authored the thread.Message ID: @.***>

lsb commented 1 year ago

I've got a React app using sentence-transformers for realtime offline embedding-based search against every article in Wikipedia, if that helps: https://github.com/lsb/sqlite-vector-search/blob/trunk/vector-search-sqlite-react/src/App.js#L104

I'm compressing the embeddings with basically a PQ48x7 FAISS index (1M pages encode to <50MB of Arrow buffers), in my new library pq.js that I've been extracting from this app: https://github.com/lsb/pq.js . Its distance and topk computations in ONNX support interactive usage on-device, and I use a few tricks to decrease visual latency as you type.

On the encoding side, I use stock sentence-transformers, and then the nanopq python library.

The full app is at https://leebutterman.com/wikipedia-search-by-vibes/

Hope this helps!

xenova commented 1 year ago

@lsb Wow that is amazing! 🤯 Do you have a tweet about it? I'd love to re-share!

Edit: Found the tweet!

xenova commented 1 year ago

Closing the issue for now, but feel free to reopen or create a new issue if you have any further questions 👍