microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.71k stars 2.93k forks source link

[Web] Memory Access Out of Bounds Error When Using ONNX Runtime Web Inference in NPM Package (wasm) #19443

Open alba-saco opened 9 months ago

alba-saco commented 9 months ago

Describe the issue

I'm encountering a "memory access out of bounds" error when attempting to run inference using onnxruntime-web within a custom npm package. The inference process works flawlessly when executed directly within the application, but after compiling the same functionality into an npm package and importing it into the application, the error occurs consistently.

Error during inference: RuntimeError: memory access out of bounds
    at ort-wasm-simd.wasm:0xa58a
    at ort-wasm-simd.wasm:0x8711d1
    at ort-wasm-simd.wasm:0x4f79b3
    at ed (audioProcessor.js:2:3820925)

To reproduce

  1. Here is the link to the package in the works. As it is still in progress, you can run npm install, npm run build and then link the package with npm or yalc
  2. Here is the link to a simple web app that uses the package. You can run npm install, yalc add onnx-audio-processor or similar, npm run build, and then serve (I usehttp-server). If you navigate to the UI, I provided a test file called mal.wav, as the preprocessing does not work for every file type yet. You can upload that file and hit the 'process file' button.

Urgency

relatively urgent project deadline

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

fs-eire commented 9 months ago

thank you for the feedback. I will check if I can reproduce this issue.

fs-eire commented 9 months ago

I noticed that in your function runInferenceParallel() from file audioProcessor.ts, you tried to run multiple times of session.run() at the same time. This is the cause of the "memory access out of bounds" issue.

First, in today's JavaScript design, you cannot do real multi-threading like other languages C++/Java because JavaScript does not have a shared heap memory (unless you use SharedArrayBuffer, but it has several constraints). The way you are doing in function runInferenceParallel(), which launches a few Promise and wait for all does not really do things in parallel because all of them are executing in the same thread. The JavaScript "async" and "await" enables the language capability to write co-routines but it's not necessarily multi-thread, or parelleling.

By modifying the runInferenceParallel() to runInferenceSequence() and await session.run() inside the for-loop fixes the issue.

Of course, I agree that onnxruntime-web should do better than throwing out that error. The issue should either be fixed, or a message explicitly disallowing multi-calls to session.run() should be outputed.

fs-eire commented 9 months ago

Some detailed explanation:

in ort-web JS code, we use stackAlloc, stackRestore to do direct stack allocation/free on WebAssembly instance's memory. This is done inside session.run(). If multiple calls to session.run() is performed (before the previous one is returned), it may cause stack unbalanced and crash the ORT.

There are 2 potential fixes : 1) to use malloc/free to replace stackalloc. 2) to force sequence execution in API entrance, throw error if multi-calls detected.

I personally prefer (2) because (1) does not really help anything - for WASM it is still using the same single thread no matter how many Promise are created; for WebGPU it does not work because JSEP explicitly set AllowConcurrentRun to false.

alba-saco commented 9 months ago

thank you! this worked. It's interesting that the 'parallel' execution worked in the main app but not in the package. regardless, this did fix the issue, thank you.