Open ggaabe opened 6 months ago
Than you for reporting this issue. I will try to figure out how to fix this problem.
So it turns out to be that dynamic import (ie. import()
) and top-level await
is not supported in current service worker. I was not expecting that import()
is banned in SW.
Currently, the WebAssembly factory (wasm-factory.ts) uses dynamic import to load the JS glue. This does not work in service worker. A few potential solutions are also not available:
await
.importScripts
: won't work, because the JS glue is ESMeval
: won't work; same to importScripts
I am now trying to make a JS bundle that does not use dynamic import for usage of service worker specifically. Still working on it
Thanks, I appreciate your efforts around this. It does seem like some special-case bundle will need to be built after all; you might need iife
or umd
for the bundler output format
Thanks, I appreciate your efforts around this. It does seem like some special-case bundle will need to be built after all; you might need
iife
orumd
for the bundler output format
I have considered this option. However, Emscripten does not offer an option to output both UMD(IIFE+CJS) & ESM for JS glue (https://github.com/emscripten-core/emscripten/issues/21899). I have to choose either. I choose the ES6 format output for the JS glue, because of a couple of problems when import UMD from ESM, and import()
is a standard way to import ESM from both ESM and UMD. ( Until I know its not working in service worker by this issue)
I found a way to make ORT web working, - yes this need the build script to do some special handling. And this will only work for ESM, because the JS glue is ESM and it seems no way to import ESM from UMD in service worker.
@ggaabe Could you please help to try import * as ort from “./ort.webgpu.bundle.min.js”
from version 1.19.0-dev.20240604-3dd6fcc089 ?
@fs-eire my project is dependent on transformersjs, which imports onnxruntime webgpu backend like this here:
https://github.com/xenova/transformers.js/blob/v3/src/backends/onnx.js#L24
Is this the right usage? In my project I've added this to my package.json to resolve onnx-runtime to this new version though the issue is still occurring:
"overrides": {
"onnxruntime-web": "1.19.0-dev.20240604-3dd6fcc089"
}
Maybe also important: The same error is still occurring in same spot in inference session in the onnx package and not from transformersjs. Do I need to add a resolver for onnxruntime-common as well?
Hi @fs-eire, is the newly-merged fix in a released build I can try?
Please try 1.19.0-dev.20240612-94aa21c3dd
@fs-eire EDIT: Nvm the comment I just deleted, that error was because I didn't set the webpack target
to webworker
.
However, I'm getting a new error now (progress!):
Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch
Update: Found the error is happening in here: https://github.com/microsoft/onnxruntime/blob/fff68c3151b774d8a2e9290e96b9f707cd950216/js/common/lib/backend-impl.ts#L83-L86
For some reason the webgpu backend.init promise is rejecting due to the null function or function signature mismatch
error. This is much further along than we were before though.
Update: Found the error is happening in here:
For some reason the webgpu backend.init promise is rejecting due to the
null function or function signature mismatch
error. This is much further along than we were before though.
Could you share me the reproduce steps?
@fs-eire You'll need to run the webGPU setup in a chrome extension.
You can use my code I just published here: https://github.com/ggaabe/extension
run npm install
run npm run build
open the chrome manage extensions
load unpacked
select the build
folder from the repo.
open the AI WebGPU Extension
extension
type some text in the text input. it will load Phi-3 mini and after finishing loading this error will occur
if you view the extension in the extension in the extension manager and select the "Inspect views service worker" link before opening the extension it will bring up an inspection window to view the errors as they occur. A little "errors" bubble link also shows up here after they occur.
You will need to click the "Refresh" button on the extension in the extension manager to rerun the error because it does not attempt reloading the model after the first attempt until another refresh
@ggaabe I did some debug on my box and made some fixes -
Changes to ONNXRuntime Web:
env.wasm.wasmPaths
is not specified.Changes to https://github.com/ggaabe/extension
https://github.com/ggaabe/extension/pull/1 need to be made to the extension example, to make it load the model correctly. Please note:
tokenizer.apply_chat_template()
. However, the WebAssembly is initialized and the model loaded successfully.Other issues:
env.wasm.wasmPaths
to a CDN URL internally. At least for this example, we don't want this behavior so we need to reset it to undefined
to keep the default behavior.Worker
is not accessible in service worker. Issue tracking: https://github.com/whatwg/html/issues/8362Awesome, thank you for your thoroughness in explaining this and tackling this head on. Is there a dev channel version I can test out?
Not yet. Will update here once it is ready.
sorry to bug; is there any dev build number? wasn't sure how often a release runs
sorry to bug; is there any dev build number? wasn't sure how often a release runs
Please try 1.19.0-dev.20240621-69d522f4e9
@fs-eire I'm getting one new error:
ort.webgpu.bundle.min.mjs:6 Uncaught (in promise) Error: The data is not on CPU. Use `getData()` to download GPU data to CPU, or use `texture` or `gpuBuffer` property to access the GPU data directly.
at get data (ort.webgpu.bundle.min.mjs:6:13062)
at get data (tensor.js:62:1)
I pushed the code changes to my repo and fixed the call to the tokenizer. To reproduce, just type 1 letter in the chrome extension’s text input and wait
Hey, I also need this. I am struggling with importing this version. So far I have been importing ONNX using
import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.webgpu.min.js"
.
However, when I change to import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0-dev.20240621-69d522f4e9/dist/esm/ort.webgpu.min.js"
it seems not to have an .../esm/
folder. Do you know why that is and how to import it then?
Hey, I also need this. I am struggling with importing this version. So far I have been importing ONNX using
import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.webgpu.min.js"
. However, when I change toimport * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0-dev.20240621-69d522f4e9/dist/esm/ort.webgpu.min.js"
it seems not to have an.../esm/
folder. Do you know why that is and how to import it then?
just replace .../esm/ort.webgpu.min.js
to .../ort.webgpu.min.mjs
should work. If you are also using service worker, use ort.webgpu.bundle.min.mjs
instead of ort.webgpu.min.mjs
.
@fs-eire I'm getting one new error:
ort.webgpu.bundle.min.mjs:6 Uncaught (in promise) Error: The data is not on CPU. Use `getData()` to download GPU data to CPU, or use `texture` or `gpuBuffer` property to access the GPU data directly. at get data (ort.webgpu.bundle.min.mjs:6:13062) at get data (tensor.js:62:1)
I pushed the code changes to my repo and fixed the call to the tokenizer. To reproduce, just type 1 letter in the chrome extension’s text input and wait
This may be a problem of transformerjs. Could you try whether this problem happen in a normal page? If so, can report the issue to transformerjs. If it's only happening in service worker, I can take a closer look
@fs-eire I can verify that using 1.19.0-dev.20240621-69d522f4e9
loading a model using webgpu
in a service worker works - even in a web extension. The necessary code is:
import * as ONNX_WEBGPU from "onnxruntime-web/webgpu";
// any Blob that contains a valid ORT model would work
// I'm using Xenova/multilingual-e5-small/onnx/model_quantized.with_runtime_opt.ort
const buffer = await mlModel.blob.arrayBuffer();
const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, {
executionProviders: ["webgpu"],
});
console.log("Loading embedding model using sessionwebGpu", sessionwebGpu);
Results in a successful execution, yay! 💯 :)
I think we can ignore the warning, printed as an error, as the session loads.
WebAssembly would work in a Service Worker. Just because Service Workers are limited in their ability to load external resources such as WASM runtime files as Blob
or ArrayBuffer
doesn't mean you can't get such data transferred into the Service Worker context. In fact, you can transfer Gigabytes instantly using MessageChannel
and the concept of Transferable objects.
Passing down a Blob/ArrayBuffer
from a content script to a background worker/service worker even works, standard-compliant, with Web Extensions, as I demonstrate here: https://github.com/w3c/webextensions/issues/293#issuecomment-2211770512
It's even much simpler for non-Web-Extension use cases as you simply only use the self.onmessage
API in a service worker to receive a MessageChannel
object and via a port
of it, receive the Blob
or ArrayBuffer
.
I'm aware that the current implementation hard-codes a few things. Like importWasmModule()
is trying to import the Emscripten runtime JS and by default, Emscripten is trying to import the WASM binary. But this isn't something that needs to be engraved into stone...
import ortWasmRuntime from "onnxruntime-web/dist/ort-wasm-simd-threaded"
As the node_modules
show:
The runtime exports a default runtime function:
ArrayBuffer
to be passed by reference:
Module['instantiateWasm'] = async(imports, onSuccess) => {
let result;
if (WebAssembly.instantiateStreaming) {
result = WebAssembly.instantiateStreaming(Module["wasmModule"], imports);
} else {
result = await WebAssembly.instantiate(Module["wasmModule"], imports)
}
return onSuccess(result.instance, result.module)
};
Of course, we don't want it that way, but I mention it as this is the "documented way".
Module["$option"]
by passing these options as an object to the runtime factory function. In this case, the passed down runtime function, imported by userland code, exactly as you already do here:{
numThreads,
// just conditionally merge in:
instantiateWasm: ONNX_WASM.env.wasm.instantiateWasm
}
import * as ONNX_WASM from "onnxruntime-web/wasm";
// the difference is, that this will be bundled in by the user-land bundler, // while the conditional dynamic import that happens in the ONNX runtime would not // as the trenary operator here: https://github.com/microsoft/onnxruntime/blob/83e0c6b96e77634dd648e890cead598b6e065cde/js/web/lib/wasm/wasm-utils-import.ts#L157 // and all it's following code cannot be statically analyzed by bundlers; tree-shaking and inline cannot happen, // so bundler will be forced to generate dynamic import() code // this could also lead to downstream issues with the transformersjs package and other packages / bundler combinations, // while this is explicit and inlined import ortWasmRuntime from "onnxruntime-web/dist/ort-wasm-simd-threaded"
// could maybe be passed a Blob via https://emscripten.org/docs/api_reference/module.html#Module.mainScriptUrlOrBlob ONNX_WASM.env.wasm.proxy = false;
// instead of always calling importWasmModule() in wasm-factory.ts, allow to pass down the callback of the Emscripten JS runtime ONNX_WASM.env.wasm.wasmRuntime = ortWasmRuntime;
// allow to also set a custom Emscripten loader ONNX_WASM.env.wasm.instantiateWasm = async(imports, onSuccess) => { let result; if (WebAssembly.instantiateStreaming) { // please note that wasmRuntimeBlob comes from user-land code. It may be passed via a MessageChannel result = WebAssembly.instantiateStreaming(await wasmRuntimeBlob.arrayBuffer(), imports); } else { // please note that wasmRuntimeBlob comes from user-land code. It may be passed via a MessageChannel result = await WebAssembly.instantiate(await wasmRuntimeBlob.arrayBuffer(), imports) } return onSuccess(result.instance, result.module) }
// then continuing as usual // please note that mlModel comes from user-land code. It may have been passed via a MessageChannel const modelBuffer = await mlModel.blob.arrayBuffer(); const sessionWasm = await ONNX_WASM.InferenceSession.create(buffer, { executionProviders: ["wasm"], }); console.log("Loading embedding model using sessionWasm", sessionWasm);
So with a 1 LoC change (using passed down runtime callback) [here](https://github.com/microsoft/onnxruntime/blob/83e0c6b96e77634dd648e890cead598b6e065cde/js/web/lib/wasm/wasm-factory.ts#L112), and 1 LoC change [here](https://github.com/microsoft/onnxruntime/blob/83e0c6b96e77634dd648e890cead598b6e065cde/js/web/lib/wasm/wasm-factory.ts#L135), (add the `instantiateWasm` callback reference), the WebAssembly backend should work as well in Service Workers, if I'm not mistaken in this 4D chess pseudo-code, reverse engineering game.
Currently, when I call the WASM implementation:
```ts
import * as ONNX_WASM from "onnxruntime-web/wasm";
const sessionWasm = await ONNX_WASM.InferenceSession.create(buffer, {
executionProviders: ["wasm"],
});
console.log("Loading embedding model using sessionWasm", sessionWasm);
Result:
Thank you for your help!
I can confirm Web GPU is working for my little chrome extension app as well, but I'm having a problem disabling the warning.
@ChTiSh
I can confirm Web GPU is working for my little chrome extension app as well, but I'm having a problem disabling the warning.
You can numb it using a brittle monkey patch...
// store original reference
const originalConsole = self.console;
// override function reference with a new arrow function that does nothing
self.console.error = () => {}
// code will internally call the function that does nothing...
const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, {
executionProviders: ["webgpu"],
});
// still works, we did only replace the reference for the .error() function
console.log("Loading embedding model using sessionwebGpu", sessionwebGpu);
// restore the original function reference, so that console.error() works just as before
self.console.error = originalConsole.error;
But I agree.. it should probably be a console.warning()
call if it is intended to be a warning
Thank you so much!!!!! The whole time I was trying to change the ort log severity, now it's fast and beautiful!!!!
On Mon, Jul 8, 2024 at 7:03 AM Aron Homberg @.***> wrote:
@ChTiSh https://github.com/ChTiSh
I can confirm Web GPU is working for my little chrome extension app as well, but I'm having a problem disabling the warning.
You can numb it using a brittle monkey patch...
const originalConsole = self.console; self.console.error = () => {} const sessionwebGpu = await ONNX_WEBGPU.InferenceSession.create(buffer, { executionProviders: ["webgpu"],});console.log("Loading embedding model using sessionwebGpu", sessionwebGpu); self.console.error = originalConsole.error;
— Reply to this email directly, view it on GitHub https://github.com/microsoft/onnxruntime/issues/20876#issuecomment-2214167629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWQJC525QV2VY4WC2636BI3ZLKL3XAVCNFSM6AAAAABIRY45QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUGE3DONRSHE . You are receiving this because you were mentioned.Message ID: @.***>
@ChTiSh You're welcome 🫶 Always happy to help :)
@fs-eire I'm getting one new error:
ort.webgpu.bundle.min.mjs:6 Uncaught (in promise) Error: The data is not on CPU. Use `getData()` to download GPU data to CPU, or use `texture` or `gpuBuffer` property to access the GPU data directly. at get data (ort.webgpu.bundle.min.mjs:6:13062) at get data (tensor.js:62:1)
I pushed the code changes to my repo and fixed the call to the tokenizer. To reproduce, just type 1 letter in the chrome extension’s text input and wait
This may be a problem of transformerjs. Could you try whether this problem happen in a normal page? If so, can report the issue to transformerjs. If it's only happening in service worker, I can take a closer look
Did data structures change for the Tensor
class? Specifically dataLocation
vs location
? And if so, did it change consistently? I'm facing issues with data
being undefined
but cpuData
being set (tokenizer result). But when I pass the data down to a BERT model, onnxruntime-web
seems to expect a different data structure and checks location
and data
. Am I missing something or has something changed? Could this lead to downstream issues of code checking for the location
and data
properties mistakenly believing the data isn't there or not in the right place? I linked a downstream issue..
@fs-eire Is there any way how I could help push this forward? Thank you :)
@kyr0 thank you a lot for your willing to help. I am currently in vacation but I will pick up this thread when I am back by end of this month.
@fs-eire Oh, I didn't mean to disturb you on vacation. Please enjoy, relax and have a lot of fun!
@kyr0 do you by any chance have a fork that I can try with the custom env.wasm.instantiateWasm
addition?
I would like to try your workaround since I am stuck in a similar spot (.wasm needs to be imported in userland code)
@asadm Well, I do have a solution for you to PoC if it would work, but I don't have a fork/PR yet. It's a bit messy, but I'll explain. If your code or a library that uses onnxruntime-web
is importing the library, the module resolution algorithm will discover node_modules/onnxruntime-web/package.json
and check for the exports
defined in order to decide on which actual file to import. Your code, or the library code that your code is using, may have an import statement like this:
import { ... } from "onnxruntime-web/webgpu"
I decided, just to try, to simply change the code of onnxruntime-web
on the fly. But the library only exports minified code by default. So I changed it's package.json exports to point to the non-minified code files instead:
{
".": {
"node": {
"import": "./dist/ort.mjs",
"require": "./dist/ort.js"
},
"import": "./dist/ort.mjs",
"require": "./dist/ort.js",
"types": "./types.d.ts"
},
"./all": {
"node": null,
"import": "./dist/ort.all.bundle.min.mjs",
"require": "./dist/ort.all.min.js",
"types": "./types.d.ts"
},
"./wasm": {
"node": null,
"import": "./dist/ort.wasm.mjs",
"require": "./dist/ort.wasm.js",
"types": "./types.d.ts"
},
"./webgl": {
"node": null,
"import": "./dist/ort.webgl.min.mjs",
"require": "./dist/ort.webgl.min.js",
"types": "./types.d.ts"
},
"./webgpu": {
"node": null,
"import": "./dist/ort.webgpu.mjs",
"require": "./dist/ort.webgpu.js",
"types": "./types.d.ts"
},
"./training": {
"node": null,
"import": "./dist/ort.training.wasm.min.mjs",
"require": "./dist/ort.training.wasm.min.js",
"types": "./types.d.ts"
}
},
If you do that change in your local filesystem in your project, your build process will now point to those files, no matter what your build system looks like.
The next thing was to make onnxruntime-web/webgpu
use my function if it is defined on env
. There is 1 LoC where the call happens and it can access env
in that scope. So I changed that code locally. Depending on what your code or the library you are using, is importing from onnxruntime-web
, it might be one of those files. Or you change all of them, just to make sure.
node_modules/onnxruntime-web/dist/ort.mjs
(line 24244), or node_modules/onnxruntime-web/dist/ort.wasm.mjs
(line 1763):
Please note that the webgpu
backend has no WASM import logic.
BEFORE:
AFTER:
I automated this process in a little NPM postinstall script, as I didn't want to spend much time on figuring all the build processes of onnxruntime-web
yet.
.replace(/importWasmModule\(/g, '(typeof env.importWasmModule === "function" ? env.importWasmModule : importWasmModule)(')
I know, I know. Hacky.. but pragmatic. You will loose your changes each time you re-install your dependencies.
But, after all, you can now simply assign the function to the env
and it will be called:
// @ts-ignore
import getModule from "./node_modules/onnxruntime-web/dist/ort-wasm-simd-threaded.jsep";
// you may need to copy this file and the WASM file into a folder so that the loader can fetch() it well
./node_modules/onnxruntime-web/dist/ort-wasm-simd-threaded.wasm
// this is a working example in my project - it loads just fine now
env.backends.onnx.importWasmModule = async (
mjsPathOverride: string,
wasmPrefixOverride: string,
threading: boolean,
) => {
console.log(
"importWasmModule",
mjsPathOverride,
wasmPrefixOverride,
threading,
);
return [
undefined,
async (moduleArgs = {}) => {
console.log("moduleArgs", moduleArgs);
return await getModule(moduleArgs);
},
];
};
My proposal would be, just to change this one line of code in this project to allow for optional Inversion of Control @fs-eire -- it could be documented with my example code. This would probably all issues regarding "user-land based WASM loading".
Okay, I made a PR for that: https://github.com/microsoft/onnxruntime/pull/21430
@kyr0 that is amazing! I was also hacking around with unminified bundle (don't want to rebuild from source etc).
Thank you so much for detailed solution, can't wait to try this when I get home!
@asadm You're welcome. No worries, we'll get this to work just fine, also for you :) Here's an impression from my background worker with the monkey patch applied. It's working, even through the @xenova/transformers.js
abstraction layer in between:
Might take a while though, until downstream projects will adopt the new version. But once the maintainers here deploy a new version including my PR, we will be able to just override the onnxruntime-web
dependencies using package overrides
I forgot to mention that you need to also alter the emscripten-generated WASM loader code too, or import a different build. emscripten generates the pthread variant with this code in the WASM runtime loader:
if (isNode) isPthread = (await import('worker_threads')).workerData === 'em-pthread';
For my PR, I should probably figure out a way to instruct emscripten to not generate code for Node.js, aka build a second variant of the runtime for Workers that has Node support explicitly disabled. When the runtime is worker, the default implementation could import that one instead. In special cases where the custom loader would be implemented by users, they could also then fallback to this variant.
Why? As stated in this conversation earlier: Top-level await isn't supported in Worker environments.
@asadm https://github.com/kyr0/easy-embeddings demonstrates the whole process; I automated the monkey-patching for the moment... https://github.com/kyr0/easy-embeddings/blob/main/scripts/setup-transformers.ts
I created #21534, which is a replacement of #21430:
instantiateWasm
directly may be not a good idea because this requires user to understand details of how WebAssembly works. I think this is unnecessary. To fulfill the requirement, allowing user to set an ArrayBuffer of the .wasm file should be good enough.import() is disallowed ...
error is already fixed after version 1.19.0-dev.20240621-69d522f4e9
. (only work for ESM. UMD will not work)@fs-eire Sounds fair, and thank you for your work on this. Is there a new dev release that contains #21534 and that I could use to test, maybe? I'd surely only use ESM, so there shouldn't be an issue with that.
1.19.0-dev.20240801-4b8f6dcbb6 includes the change.
To clarify, is the best way to go about running transformers.js with WebGPU for the onnxruntime to monkeypatch the package to make the necessary wasm stuff load in each service worker, a la @kyr0's easy-embeddings
? (Having some issues with that workflow, see https://github.com/kyr0/easy-embeddings/issues/1)
Has anyone had luck / have tips for just running the v3 branch of Transformers.js? Or, maybe more precisely — do we know how something like Segment Anything WebGPU, which Xenova has in an HF Space, is working? Seems like there's been some official solution here but I can't find it documented / implemented well.
To clarify, is the best way to go about running transformers.js with WebGPU for the onnxruntime to monkeypatch the package to make the necessary wasm stuff load in each service worker, a la @kyr0's
easy-embeddings
? (Having some issues with that workflow, see kyr0/easy-embeddings#1)Has anyone had luck / have tips for just running the v3 branch of Transformers.js? Or, maybe more precisely — do we know how something like Segment Anything WebGPU, which Xenova has in an HF Space, is working? Seems like there's been some official solution here but I can't find it documented / implemented well.
I am working with Transformer.js to make v3 branch compatible with latest module system. This is one of the merged changes: https://github.com/xenova/transformers.js/pull/864. You probably need to use some workaround for now, but (hopefully) eventually you should be able to use it out of box.
@lucasgelfond Now that the new updates from @fs-eire are in place, I'm probably able to streamline the workaround. I'll have a look soon, but as I'm on vacation right now, I cannot give an ETA, unfortunately.
Thank you @fs-eire and @kyr0 ! No huge rush on my end, ended up getting inference working on WebGPU just on vanilla onnxruntime, will share results in a bit!
Has anyone tried getting these imports working in Vite/other bundlers? When I try the classic:
import * as ONNX_WEBGPU from 'onnxruntime/webgpu
(which works in create-react-app), Vite says:
Error: Failed to scan for dependencies from entries:
/Users/focus/Projects/---/webgpu-sam2/frontend/src/routes/+page.svelte
✘ [ERROR] Missing "./webgpu/index.js" specifier in "onnxruntime-web" package [plugin vite:dep-scan]
Anyways, I tried importing from url, a la
import { InferenceSession, Tensor as ONNX_TENSOR } from 'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.20.0-dev.20240810-6ae7e02d34/dist/ort.webgpu.min.js';
which Vite also doesn't like
3:34:51 PM [vite] Error when evaluating SSR module /src/encoder.svelte: failed to import "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.20.0-dev.20240810-6ae7e02d34/dist/ort.webgpu.min.js"
|- Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader.
I disabled SSR in Svelte but still seemingly no luck/change.
I tried manually downloading the files with CURL, where I got an error about the lack of source map, so, I also downloaded .min.js.map. When I run it now, this works, but I get back to the original error in the thread about unavailable backends:
Error: no available backend found. ERR: [webgpu] TypeError: Failed to fetch dynamically imported module: http://localhost:5173/src/ort-wasm-simd-threaded.jsep.mjs
I figured it might work to just import directly, so I also tried:
import * as ONNX_WEBGPU from 'onnxruntime-web/dist/ort.webgpu.min.mjs';
but then I got
4:07:38 PM [vite] Internal server error: Missing "./dist/ort.webgpu.min.mjs" specifier in "onnxruntime-web" package
Anyone have ideas of how to handle? Happy to add more verbose error messages for any of the stuff above.
Has anyone tried getting these imports working in Vite/other bundlers? When I try the classic:
import * as ONNX_WEBGPU from 'onnxruntime/webgpu
(which works in create-react-app), Vite says:
Error: Failed to scan for dependencies from entries: /Users/focus/Projects/---/webgpu-sam2/frontend/src/routes/+page.svelte ✘ [ERROR] Missing "./webgpu/index.js" specifier in "onnxruntime-web" package [plugin vite:dep-scan]
Anyways, I tried importing from url, a la
import { InferenceSession, Tensor as ONNX_TENSOR } from 'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.20.0-dev.20240810-6ae7e02d34/dist/ort.webgpu.min.js';
which Vite also doesn't like
3:34:51 PM [vite] Error when evaluating SSR module /src/encoder.svelte: failed to import "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.20.0-dev.20240810-6ae7e02d34/dist/ort.webgpu.min.js" |- Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader.
I disabled SSR in Svelte but still seemingly no luck/change.
I tried manually downloading the files with CURL, where I got an error about the lack of source map, so, I also downloaded .min.js.map. When I run it now, this works, but I get back to the original error in the thread about unavailable backends:
Error: no available backend found. ERR: [webgpu] TypeError: Failed to fetch dynamically imported module: http://localhost:5173/src/ort-wasm-simd-threaded.jsep.mjs
I figured it might work to just import directly, so I also tried:
import * as ONNX_WEBGPU from 'onnxruntime-web/dist/ort.webgpu.min.mjs';
but then I got
4:07:38 PM [vite] Internal server error: Missing "./dist/ort.webgpu.min.mjs" specifier in "onnxruntime-web" package
Anyone have ideas of how to handle? Happy to add more verbose error messages for any of the stuff above.
Could you share me a repo that I can reproduce the issue? I will take a look.
@fs-eire you are amazing! https://github.com/lucasgelfond/webgpu-sam2
I swapped over to Webpack (in the svelte-webpack directory) but the original Vite version is in there. No immediate rush because I solved temporarily with Webpack, but Webpack breaks some other imports so would be awesome to move back—thanks so much again!
👋 Thank you @fs-eire ! I tried using 1.19.0-dev.20240801-4b8f6dcbb6
inside of a chrome mv3 extension and it worked right away with the webgpu backend, however I'm more interested in using the wasm backend for running a simple decision forest as it doesn't include jsep and makes the overall bundle 10MB lighter. I wondered if there were any plans to support it too in a near future ?
Describe the issue
I'm running into issues trying to use the WebGPU or WASM backends inside of a ServiceWorker (on a chrome extension). More specifically, I'm attempting to use Phi-3 with transformers.js v3
Every time I attempt this, I get the following error:
This is originating in the
InferenceSession
class injs/common/lib/inference-session-impl.ts
.More specifically, it's happening in this method:
const [backend, optionsWithValidatedEPs] = await resolveBackendAndExecutionProviders(options);
where the implementation is injs/common/lib/backend-impl.ts
and thetryResolveAndInitializeBackend
fails to initialize any of the execution providers.WebGPU is now supported in ServiceWorkers though; it is a recent change and it should be feasible. Here were the chrome release notes.
Additionally, here is an example browser extension from the mlc-ai/web-llm framework that implements WebGPU usage in service workers successfully: https://github.com/mlc-ai/web-llm/tree/main/examples/chrome-extension-webgpu-service-worker
Here is some further discussion on this new support from Google itself: https://groups.google.com/a/chromium.org/g/chromium-extensions/c/ZEcSLsjCw84/m/WkQa5LAHAQAJ
So technically I think it should be possible for this to be supported now? Unless I'm doing something else glaringly wrong. Is it possible to add support for this?
To reproduce
Download and set up the transformers.js extension example and put this into the background.js file:
Urgency
this would help enable a new ecosystem to build up around locally intelligent browser extensions and tooling.
it's urgent for me because it would be fun to build and I want to build it and it would be fun to be building it rather than not be building it.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.0-dev.20240509-69cfcba38a
Execution Provider
'webgpu' (WebGPU)