Open do-me opened 1 year ago
This is so cool! I plan to completely rewrite the demo application which, as you can tell, is extremely simple... so this definitely sounds like something I can add!
~PS: Do you have a Twitter post I can retweet? I'd love to share it!~ Edit: Found it!
@do-me Just a heads up that I updated the feature-extraction API to support other models (not just sentence-transformers). To use the updated API, you just need to add { pooling: 'mean', normalize: true }
to the pipeline call. Your demo site seems unaffected (as it is still using the previous version), but if you'd like to add support for other models, you can make the following changes:
For example:
Before:
let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.');
console.log(result);
// Tensor {
// type: 'float32',
// data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
// dims: [1, 384]
// }
After:
let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
// type: 'float32',
// data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
// dims: [1, 384]
// }
And if you don't want to do pooling/normalization, you can leave it out. You will then get the embeddings for each token in the sequence.
Also - we're planning on releasing a semantic search demo next week 🥳 (so, watch this space!)
This is awesome, thanks for pinging me!
I'm very interested in this feature, mainly for speed improvements. Do you have some benchmarks at hand how the new pooling approach compares to sequential processing?
Also, I'd be curious to know if there's a sweet spot somewhere how many elements could/should be passed to the model at once.
And one more detail but it's probably also model dependent: can you track the progress of a batch/pool that has been passed to the model? E.g. if I pass 1000 elements at once, is there any theoretic way to return the progress so I can update the progress bar in the frontend meanwhile?
fyi SemanticFinder just had a great contribution from @VarunNSrivastava improving the UI significantly with new features. Also updated the transformers.js version: New Demo
Hey, joining the semantic search on the FE party 🥳 .
I'm wondering if we can leverage the power of threads in this scenario by setting env.backends.onnx.wasm.numThreads = 4
.
I don't see any errors throw, but also no drastic performance improvements.
@lizozom Hi there! 👋
So, the most likely reason for this is that SharedArrayBuffer
is not available because COOP/COEP headers are not set for the hosted files. You can check your network tab when running the model and you should see ort-wasm-simd.wasm
loaded instead of ort-wasm-simd-threaded.wasm
. For more information, check out this related open issue: https://github.com/xenova/transformers.js/issues/161
To fix this, it depends where you are hosting the website, as these headers must be set by the server. At the moment, GitHub pages does not offer this (https://github.com/orgs/community/discussions/13309), but there are some workarounds (cc @josephrocca). On the other hand, we are actively working to support this feature in Hugging Face spaces (https://github.com/huggingface/huggingface_hub/issues/1525), which should hopefully be ready soon!
Seems like netlify offers a little more flexibility. I'm a very happy user of netlify (hosting my blog there since 2019 without any trouble) and it's pretty easy to link a GitHub repo to it. @lizozom if needed, we might consider switching from GitHub pages to netlify.
Cool! I'll check and let you know.
Current workaround is to put this file beside your HTML file, and then import it with a script tag in your document <head>
. The Github Pages engineering lead said a few days ago that they are working on custom headers but there's no ETA.
I personally wouldn't go with Netlify, since their pricing is a bit to aggressive for my use cases, but depends on what you're doing. Netlify's free 100GB could be used up very quickly if you have a few assets like ML models or videos or whatever (even just a few thousand visitors - e.g. due to being shared on Twitter or HN). Cloudflare Pages is much better imo (unlimited bandwidth and requests for free), but again it depends on your use case - Netlify may suffice.
Thanks for the hint! Does Cloudflare Pages offer custom headers?Â
Unlimited bandwidth sounds indeed great! Will check it out.
Luckily we don't need to host the models but only the static page with the framework (currently everything bundled is ~2Mb) so it's not that bad but still something to keep in mind.
I haven't actually had to do that with Cloudflare Pages yet, but here are their docs for custom headers: https://developers.cloudflare.com/pages/platform/headers/
I tested this out on a local webpack
project, serving files with these headers:
devServer: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'require-corp',
},
},
And indeed, this causes the threaded version (ort-wasm-simd-threaded.wasm
) to be loaded.
I'm not seeing much of a performance difference right away, but I'll tinker with it some more.
@xenova In your opinion, should I expect to see performance improvements if I'm running a large batch of embeddings
pipelines single vs. multi threaded?
@lizozom yes, we should be seeing improvements, but I believe there is a bug in ORT which is not correctly allocating work among the threads. There is an ongoing discussion about this here: https://github.com/xenova/transformers.js/issues/161
Sweet, I'll keep track. Let me know if I can help there in any way!
@VarunNSrivastava built a really nice Chrome extension for SemenaticFinder. You can already install it locally as explained here.
We submitted it for review so it should be a matter of days (hopefully) or few weeks in the worst case.
It's working very well for many different types of pages (even pdfs if they end with .pdf!). There is a settings page too where it's highly recommended to raise the minimum segment length if there is lots of text on a page (like more than 10 pages for example). You can also choose a different model if you're working with non-English content.
I spotted the gap in the HF docs about developing a browser extension and was wondering whether we could give a hand in filling it? In the end, our application isn't too complex in terms or "moving" parts so it might make for a good example. Also, we already learnt about some caveats that might be good to write down.
That would be amazing! 🤯 Yes please! You could even strip down the tutorial quite a bit if you want (the simpler, the better).
We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue.Â
I just have one question which is relevant to both, the extension and SemanticFinder, I just couldn't quite understand from the HF docs:
When using text2text-generation
like Xenova/LaMini-Flan-T5-783M
or summarization
like Xenova/distilbart-cnn-6-6
var outputElement = document.getElementById("output");
async function allocatePipeline(instruction) {
let classifier = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
output = await classifier(instruction, {
max_new_tokens: 100
});
outputElement.innerHTML = output[0];
}
allocatePipeline()
var outputElement = document.getElementById("output");
async function allocatePipeline(inText) {
let generator = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
let out = await generator(inText, {
max_new_tokens: 100,
});
outputElement.innerHTML = out[0].summary_text;
}
allocatePipeline("some test text to summarize");
how can I add a callback, so that my html component is updated each time a new token is created? I tried with different kinds of callbacks and searched through the API but I have the impression that I'm missing something quite obvious.
The callback functionality is not very well-documented (perhaps for good reason), since it's non-standard and at the time of its creation, didn't have an equivalent mechanism in transformers.
For now, you can replicate what I did here using the callback_function
generation parameter:
We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue.
PS: please check out this PR, it removes the redundant CustomCache
class. Let me know if that helps!
For now, you can replicate what I did here using the callback_function generation parameter
Thanks a lot, this pointed me in the right direction!
However, I needed to import AutoTokenizer
and use it this way
let tokenizer = await AutoTokenizer.from_pretrained(model);
I noticed that without a worker.js
you cannot update the DOM for each generated token/beam as the event loop is blocked, which might be something for the docs. Making the callback async and using await
in the callback function doesn't help. It's probably in the nature of the package architecture that it cannot work differently.
However, for a minimal example, demonstrating e.g. the speed of token generation, you can still log it to the console and watch it live:
callback_function: function (beams) {
const decodedText = tokenizer.decode(beams[0].output_token_ids, {
skip_special_tokens: true});
console.log(decodedText);
}
Demo here.
Yes that's correct, the best way I have found around this is to use the Web Worker API, and post messages back to the main thread in the callback_function
:
and you initialize the worker like: https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/main.js#L16-L19
@xenova thank you for your extraordinary work.
@do-me I would like to know how did you connected to transformers using Vue.
I am currently working on a project with Vue3, in TS, and keep getting SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
when try to load a config or pipeline.
the code is simple: in a Vue component:
<script setup lang="ts">
import { env, pipeline, AutoConfig } from '@xenova/transformers'
await AutoConfig.from_pretrained(repoid)
</script>
or in a ts file:
import { env, pipeline, AutoConfig } from '@xenova/transformers'
import { defineStore } from 'pinia'
export const TransformerJs = defineStore('transformers', () => {
function setupOnnx() {
// env.localModelPath = '@/assets/models/'
env.allowRemoteModels = true
env.allowLocalModels = false
}
async function downloadModel(repoid:string, taskid:any) {
await AutoConfig.from_pretrained(repoid)
}
return { env, setupOnnx, downloadModel }
})
Did you change directly in the transformer.js
to support Vue, or nothing special?
@Fhrozen As long as you:
env.allowLocalModels = false
, andIt should work. This will be fixed in Transformers.js v3, where allowLocalModels
will default to false when running in the browser.
@Fhrozen, I'm pinging @VarunNSrivastava who created the entire vue-based browser plugin. Feel free to ask any questions!
@xenova, Thank you very much for the details. As you mentioned, the issue was caused by the change allowremote
from true
to false
.
@do-me, Thank you very much; I will be submitting any questions. However, I think I will be opening a different Issue that could be dedicated to Vue + Transformers.JS.
Hi @xenova, first of all thanks for the amazing library - it's awesome to be able to play around with the models without a backend!
I just created SemanticFinder, a semantic search engine in the browser with the help of transformers.js and sentence-transformers/all-MiniLM-L6-v2.
You can find some technical details in the blog post.
I was wondering whether you'd be interested in showcasing semantic search as new demo type. Technically, it's not a new model but it's a new use case with an existing model so I don't know whether it's out of scope.
Anyway, just wanted to let you know that you're work is very much appreciated!