microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.46k stars 2.9k forks source link

Unpredictable onnxruntime-node crash when using Electron #20084

Open NexelOfficial opened 6 months ago

NexelOfficial commented 6 months ago

Describe the issue

I'm using onnxruntime-node in an Electron project. I'm trying to implement a multithreading solution with Node Worker Threads, but when creating a bunch of worker threads in Electron, they randomly crash when importing onnxruntime-node. The issue is unpredictable, sometimes it happens, sometimes it doesn't. The more Workers created, the more frequently the crash occurs. Here is the full crash:

[7848:0326/114947.582:ERROR:crashpad_client_win.cc(868)] not connected

 ELIFECYCLE  Command failed with exit code 4294930435.

After this, the app exits. Again, this is all confusing to me because it's random whether the crash will occur.

To reproduce

  1. Clone my example repo that demonstrates this issue: https://github.com/NexelOfficial/electron-onnx-workers.
  2. Install dependencies using pnpm or npm.
  3. Run the app using electron . (or pnpm start / npm run start)
  4. If the error doesn't occur, please increase the number of Workers in index.js like in the example below. Again, this error is unpredictable so it can take some tries before it occurs.
    for (let i = 0; i < WORKER_AMOUNT_GOES_HERE; i++) {
    new Worker(path.join(__dirname, "onnxWorker.js"));
    }

    Urgency

    Project deadline in about 4 weeks

    System information

    Platform

    Windows 10

    OS Version

    Pro Edition x64

    ONNX Runtime Installation

    onnxruntime-node

    ONNX Runtime Version or Commit ID

    1.17.0

    ONNX Runtime API

    Not provided

    Architecture

    Not provided

    Execution Provider

    Not provided

    Execution Provider Library Version

    Not provided

NexelOfficial commented 6 months ago

Update: The error seems to be gone when spawning the threads one after the other, and waiting for the model to load. Make the following changes to replicate the fix:

  1. In your worker, add the following code:
    
    // Load your model
    await onnxruntime.InferenceSession.create("model/yolov8n-pose.onnx", {
    enableMemPattern: false,
    intraOpNumThreads: 1,
    });

// Add this line here. It will send a message to the main thread that the next worker can be loaded. parentPort.postMessage({ message: "ONNX_READY" });

2. In your main thread, add the following code:
```js
// Keep a list of all workers
const workers = [];

const createWorker = () => {
  const worker = new Worker(path.join(__dirname, "onnxWorker.js"));
  workers.push(worker);

  // Create another worker when the previous one is loaded and more are needed
  worker.on("message", (data) => {
    // Continue if there are not 8 workers yet (change to however many you need)
    if (data.message === "ONNX_READY" && workers.length < 8) {
      return createWorker();
    }
  });
};

app.whenReady().then(() => {
  // ... other code

  // Start creating workers when your app is loaded
  createWorker();
});

You might need to add some extra code that will wait for all workers to start as it will take some time for all the workers to be created. I have not seen the error on any of my three machines since, and I hope this will help others out as well.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.