Is it possible to run this in node?

huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

https://huggingface.co/docs/transformers.js

Apache License 2.0

12.05k stars 763 forks source link

Is it possible to run this in node? #4

Closed Madd0g closed 1 year ago

Madd0g commented 1 year ago

I got this error when trying:

TypeError [ERR_WORKER_PATH]: The worker script or module filename must be an absolute path or a relative path starting with './' or '../'. Received "blob:nodedata:....

xenova commented 1 year ago

Hi there. Yes, this is actually a known bug in onnx runtime web. (Similar problem to this: https://github.com/microsoft/onnxruntime/issues/14445)

You can fix it as follows:

// 1. Fix "ReferenceError: self is not defined" bug when running directly with node
// https://github.com/microsoft/onnxruntime/issues/13072
global.self = global;

const { pipeline, env } = require('@xenova/transformers')

// 2. Disable spawning worker threads for testing.
// This is done by setting numThreads to 1
env.onnx.wasm.numThreads = 1

// 3. Continue as per usual:
// ...

You can see a working example in the testing script: https://github.com/xenova/transformers.js/blob/main/tests/index.js

(In that case, I also set env.remoteModels=false, for testing locally)

There is seems to be a node-specific module for onnxruntime (https://www.npmjs.com/package/onnxruntime-node), but I haven't tested it.

Madd0g commented 1 year ago

thanks, that made it work!

are you planning to integrate onnxruntime-node in the future? would be cool to be able to choose

Madd0g commented 1 year ago

(also seems like there's no caching when running this under node, so it redownloads the model every time)

xenova commented 1 year ago

(also seems like there's no caching when running this under node, so it redownloads the model every time)

Yes, that is correct. At the moment, caching is only implemented with the Cache Web API, which is not available for node.

I will hopefully add that functionality soon, so that it acts in a similar way to huggingface's system, which downloads it to your file system.

In the meantime, I suggest you just download the model and place it in the ./models/onnx/quantized folder (or another location, provided you set env.localURL)

dkogut1996 commented 1 year ago

Im getting the ReferenceError: self is not defined bug when running in typescript and trying to import Autotokenizer.

I have global.self = global at the top of the file, and vscode complains with this error: Type 'typeof globalThis' is not assignable to type 'Window & typeof globalThis'.

Importing using import { AutoTokenizer } from '@xenova/transformers' and simply referencing the class produces the reference error.

Any ideas how this would work in ts?

xenova commented 1 year ago

Im getting the ReferenceError: self is not defined bug when running in typescript and trying to import Autotokenizer.

I have global.self = global at the top of the file, and vscode complains with this error: Type 'typeof globalThis' is not assignable to type 'Window & typeof globalThis'.

Importing using import { AutoTokenizer } from '@xenova/transformers' and simply referencing the class produces the reference error.

Any ideas how this would work in ts?

I unfortunately haven't worked with typescript before, so, I wouldn't be able to give you very good advice 😅 ... However, I do intend to one day convert the library to TS!

In the meantime, I tried asking Chat-GPT. Here are some of its answers. Do take it with a grain of salt though, because I don't quite know what it means haha 😅

In future, It will be best for the library to support both web and node versions of onnxruntime, and choose which to use based on the environment.

dkogut1996 commented 1 year ago

Ah ok, those didn't work but it was hilarious.

I'm actually using your tokenizer classes so that I may use onnxruntime-node without having to rewrite the tokenizer libraries for the various LLMs out there. This repo is awesome and you've been very responsive, so thanks so much for all the help!

xenova commented 1 year ago

Ah ok, those didn't work but it was hilarious.

I'm actually using your tokenizer classes so that I may use onnxruntime-node without having to rewrite the tokenizer libraries for the various LLMs out there. This repo is awesome and you've been very responsive, so thanks so much for all the help!

Great! 👍 No worries 😄

getting this version working out-of-the-box for node + typescript is on the TODO list :)

Madd0g commented 1 year ago

Ah ok, those didn't work but it was hilarious.

I have this on the top of the file:

(global as any).self = global;

Actually CGPT suggested better approaches, since this will not fly in a strict: true codebase.