xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
10.97k stars 668 forks source link

[Question] New demo type/use case: semantic search (SemanticFinder) #84

Open do-me opened 1 year ago

do-me commented 1 year ago

Hi @xenova, first of all thanks for the amazing library - it's awesome to be able to play around with the models without a backend!

I just created SemanticFinder, a semantic search engine in the browser with the help of transformers.js and sentence-transformers/all-MiniLM-L6-v2.

You can find some technical details in the blog post.

I was wondering whether you'd be interested in showcasing semantic search as new demo type. Technically, it's not a new model but it's a new use case with an existing model so I don't know whether it's out of scope.

Anyway, just wanted to let you know that you're work is very much appreciated!

xenova commented 1 year ago

This is so cool! I plan to completely rewrite the demo application which, as you can tell, is extremely simple... so this definitely sounds like something I can add!

~PS: Do you have a Twitter post I can retweet? I'd love to share it!~ Edit: Found it!

xenova commented 1 year ago

@do-me Just a heads up that I updated the feature-extraction API to support other models (not just sentence-transformers). To use the updated API, you just need to add { pooling: 'mean', normalize: true } to the pipeline call. Your demo site seems unaffected (as it is still using the previous version), but if you'd like to add support for other models, you can make the following changes:

For example:

Before:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.');
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

After:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

And if you don't want to do pooling/normalization, you can leave it out. You will then get the embeddings for each token in the sequence.

xenova commented 1 year ago

Also - we're planning on releasing a semantic search demo next week 🥳 (so, watch this space!)

do-me commented 1 year ago

This is awesome, thanks for pinging me!

I'm very interested in this feature, mainly for speed improvements. Do you have some benchmarks at hand how the new pooling approach compares to sequential processing?

Also, I'd be curious to know if there's a sweet spot somewhere how many elements could/should be passed to the model at once.

And one more detail but it's probably also model dependent: can you track the progress of a batch/pool that has been passed to the model? E.g. if I pass 1000 elements at once, is there any theoretic way to return the progress so I can update the progress bar in the frontend meanwhile?

do-me commented 1 year ago

fyi SemanticFinder just had a great contribution from @VarunNSrivastava improving the UI significantly with new features. Also updated the transformers.js version: New Demo

lizozom commented 1 year ago

Hey, joining the semantic search on the FE party 🥳 .

I'm wondering if we can leverage the power of threads in this scenario by setting env.backends.onnx.wasm.numThreads = 4. I don't see any errors throw, but also no drastic performance improvements.

xenova commented 1 year ago

@lizozom Hi there! 👋

So, the most likely reason for this is that SharedArrayBuffer is not available because COOP/COEP headers are not set for the hosted files. You can check your network tab when running the model and you should see ort-wasm-simd.wasm loaded instead of ort-wasm-simd-threaded.wasm. For more information, check out this related open issue: https://github.com/xenova/transformers.js/issues/161

To fix this, it depends where you are hosting the website, as these headers must be set by the server. At the moment, GitHub pages does not offer this (https://github.com/orgs/community/discussions/13309), but there are some workarounds (cc @josephrocca). On the other hand, we are actively working to support this feature in Hugging Face spaces (https://github.com/huggingface/huggingface_hub/issues/1525), which should hopefully be ready soon!

do-me commented 1 year ago

Seems like netlify offers a little more flexibility. I'm a very happy user of netlify (hosting my blog there since 2019 without any trouble) and it's pretty easy to link a GitHub repo to it. @lizozom if needed, we might consider switching from GitHub pages to netlify.

lizozom commented 1 year ago

Cool! I'll check and let you know.

josephrocca commented 1 year ago

Current workaround is to put this file beside your HTML file, and then import it with a script tag in your document <head>. The Github Pages engineering lead said a few days ago that they are working on custom headers but there's no ETA.

I personally wouldn't go with Netlify, since their pricing is a bit to aggressive for my use cases, but depends on what you're doing. Netlify's free 100GB could be used up very quickly if you have a few assets like ML models or videos or whatever (even just a few thousand visitors - e.g. due to being shared on Twitter or HN). Cloudflare Pages is much better imo (unlimited bandwidth and requests for free), but again it depends on your use case - Netlify may suffice.

do-me commented 1 year ago

Thanks for the hint! Does Cloudflare Pages offer custom headers? 
Unlimited bandwidth sounds indeed great! Will check it out.
Luckily we don't need to host the models but only the static page with the framework (currently everything bundled is ~2Mb) so it's not that bad but still something to keep in mind.

josephrocca commented 1 year ago

I haven't actually had to do that with Cloudflare Pages yet, but here are their docs for custom headers: https://developers.cloudflare.com/pages/platform/headers/

lizozom commented 1 year ago

I tested this out on a local webpack project, serving files with these headers:

  devServer: {
    headers: {
      'Cross-Origin-Opener-Policy': 'same-origin', 
      'Cross-Origin-Embedder-Policy': 'require-corp',
    },
  },

And indeed, this causes the threaded version (ort-wasm-simd-threaded.wasm) to be loaded. I'm not seeing much of a performance difference right away, but I'll tinker with it some more.

@xenova In your opinion, should I expect to see performance improvements if I'm running a large batch of embeddings pipelines single vs. multi threaded?

xenova commented 1 year ago

@lizozom yes, we should be seeing improvements, but I believe there is a bug in ORT which is not correctly allocating work among the threads. There is an ongoing discussion about this here: https://github.com/xenova/transformers.js/issues/161

lizozom commented 1 year ago

Sweet, I'll keep track. Let me know if I can help there in any way!

do-me commented 12 months ago

@VarunNSrivastava built a really nice Chrome extension for SemenaticFinder. You can already install it locally as explained here.

We submitted it for review so it should be a matter of days (hopefully) or few weeks in the worst case.

It's working very well for many different types of pages (even pdfs if they end with .pdf!). There is a settings page too where it's highly recommended to raise the minimum segment length if there is lots of text on a page (like more than 10 pages for example). You can also choose a different model if you're working with non-English content.

I spotted the gap in the HF docs about developing a browser extension and was wondering whether we could give a hand in filling it? In the end, our application isn't too complex in terms or "moving" parts so it might make for a good example. Also, we already learnt about some caveats that might be good to write down.

xenova commented 12 months ago

That would be amazing! 🤯 Yes please! You could even strip down the tutorial quite a bit if you want (the simpler, the better).

do-me commented 12 months ago

We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue. 

I just have one question which is relevant to both, the extension and SemanticFinder, I just couldn't quite understand from the HF docs:

When using text2text-generation like Xenova/LaMini-Flan-T5-783M or summarization like Xenova/distilbart-cnn-6-6

var outputElement = document.getElementById("output");

async function allocatePipeline(instruction) {
  let classifier = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
  output = await classifier(instruction, {
    max_new_tokens: 100
  });

  outputElement.innerHTML = output[0];
}
allocatePipeline()
var outputElement = document.getElementById("output");

async function allocatePipeline(inText) {
  let generator = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
  let out = await generator(inText, {
    max_new_tokens: 100,
  });

  outputElement.innerHTML = out[0].summary_text;
}

allocatePipeline("some test text to summarize");

how can I add a callback, so that my html component is updated each time a new token is created? I tried with different kinds of callbacks and searched through the API but I have the impression that I'm missing something quite obvious.

xenova commented 12 months ago

The callback functionality is not very well-documented (perhaps for good reason), since it's non-standard and at the time of its creation, didn't have an equivalent mechanism in transformers.

For now, you can replicate what I did here using the callback_function generation parameter:

https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/worker.js#L191-L194

xenova commented 11 months ago

We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue.

PS: please check out this PR, it removes the redundant CustomCache class. Let me know if that helps!

do-me commented 11 months ago

For now, you can replicate what I did here using the callback_function generation parameter

Thanks a lot, this pointed me in the right direction! However, I needed to import AutoTokenizer and use it this way

let tokenizer = await AutoTokenizer.from_pretrained(model);

I noticed that without a worker.js you cannot update the DOM for each generated token/beam as the event loop is blocked, which might be something for the docs. Making the callback async and using await in the callback function doesn't help. It's probably in the nature of the package architecture that it cannot work differently.

However, for a minimal example, demonstrating e.g. the speed of token generation, you can still log it to the console and watch it live:

    callback_function: function (beams) {
      const decodedText = tokenizer.decode(beams[0].output_token_ids, {
          skip_special_tokens: true});
      console.log(decodedText);
    }

Demo here.

image

xenova commented 11 months ago

Yes that's correct, the best way I have found around this is to use the Web Worker API, and post messages back to the main thread in the callback_function:

https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/worker.js#L189-L202

and you initialize the worker like: https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/main.js#L16-L19

Fhrozen commented 6 months ago

@xenova thank you for your extraordinary work. @do-me I would like to know how did you connected to transformers using Vue. I am currently working on a project with Vue3, in TS, and keep getting SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON when try to load a config or pipeline.

the code is simple: in a Vue component:

<script setup lang="ts">
import { env, pipeline, AutoConfig  } from '@xenova/transformers'
await AutoConfig.from_pretrained(repoid)
</script>

or in a ts file:

import { env, pipeline, AutoConfig  } from '@xenova/transformers'
import { defineStore } from 'pinia'

export const TransformerJs = defineStore('transformers', () => {
  function setupOnnx() {
    // env.localModelPath = '@/assets/models/'
    env.allowRemoteModels = true
    env.allowLocalModels = false
  }
  async function downloadModel(repoid:string, taskid:any) {
    await AutoConfig.from_pretrained(repoid)
  }
  return { env, setupOnnx, downloadModel }
})

Did you change directly in the transformer.js to support Vue, or nothing special?

xenova commented 6 months ago

@Fhrozen As long as you:

  1. Set env.allowLocalModels = false, and
  2. Delete cached files from devtools' Application tab

It should work. This will be fixed in Transformers.js v3, where allowLocalModels will default to false when running in the browser.

do-me commented 6 months ago

@Fhrozen, I'm pinging @VarunNSrivastava who created the entire vue-based browser plugin. Feel free to ask any questions!

Fhrozen commented 6 months ago

@xenova, Thank you very much for the details. As you mentioned, the issue was caused by the change allowremote from true to false. @do-me, Thank you very much; I will be submitting any questions. However, I think I will be opening a different Issue that could be dedicated to Vue + Transformers.JS.