Run onnx on GPU in javascript ?

shimaamorsy commented 6 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

How can i inference the onnx on the GPU in javascript environment ?

Additional

No response

github-actions[bot] commented 6 months ago

👋 Hello @shimaamorsy, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 6 months ago

@shimaamorsy running ONNX models on a GPU in a JavaScript environment directly can be challenging due to the limitations in accessing GPU resources purely from JavaScript. However, one way to leverage GPU for your ONNX model in a JS environment might be via WebAssembly and WebGL. You could use ONNX.js, a library that allows ONNX models to run in the browser, potentially utilizing WebGL for GPU acceleration.

Here's a basic setup to get you started:

const ort = require('onnxruntime-web');

async function runModel() {
  // Load your ONNX model
  const session = await ort.InferenceSession.create('path/to/your/model.onnx');

  // Create a tensor for input
  const inputTensor = new ort.Tensor('float32', typedArray, [1, 3, 224, 224]); // Example shape and type

  // Run the model
  const outputMap = await session.run({ inputName: inputTensor });

  // Process the output
  const outputTensor = outputMap.get('outputName'); // Use your actual output name
  console.log(outputTensor.data);
}

runModel().catch(console.error);

Make sure to adjust "path/to/your/model.onnx" and tensor shapes/types according to your model.

Note: ONNX.js attempts to use WebGL for computation where available, but it's essential to be aware that actual GPU acceleration might vary depending on the browser's support for WebGL and WebAssembly's capabilities.

For specific GPU control and heavy computation, a server-side solution (like Python with CUDA support) that communicates with your JS frontend might be more effective. 😊

DistinctVision commented 6 months ago

Tfjs uses WebGL and GPU for an inference. It looks like you can use tf-lite + tfjs

shimaamorsy commented 6 months ago

What you mean ?

DistinctVision commented 6 months ago

Just a suggestion. ONNX supports WebGL for a GPU inference but for not all operations. An another option is to export model to "tfjs" or "tflite" format and use Tensorflow js

glenn-jocher commented 6 months ago

@DistinctVision you're correct! While ONNX.js provides some GPU acceleration via WebGL, it might not cover all operations depending on your model's specifics. Turning to TensorFlow.js or converting your model to TensorFlow Lite and then using TensorFlow.js is indeed a viable alternative. TensorFlow.js leverages WebGL under the hood for GPU-accelerated inference, offering broader operation support and potentially better performance for web applications. Here's a quick snippet on how you might load and run a TFLite model with TensorFlow.js:

// Assuming you've loaded TensorFlow.js
const model = await tf.loadGraphModel('path/to/your/model/model.json');

const input = tf.tensor([...], [1, 224, 224, 3]); // Example input tensor
const output = model.predict(input);

Make sure to check the TensorFlow.js documentation for more details on model conversion and execution. Happy coding! 😊

shimaamorsy commented 6 months ago

In JavaScript, is there any way to convert the object returned from the inferenceSession function into a serializable object so that I can store it in the cache or send it to the worker ?

glenn-jocher commented 6 months ago

@shimaamorsy absolutely! To convert the inference output into a serializable object, you can use tensor.data() to extract the data from the ONNX.js Tensor object into a typed array, which is then serializable. For example:

const output = await inferenceSession.run(...);
const data = await output.yourOutputTensor.data();
const serializableOutput = Array.from(data);

You can now cache or post serializableOutput to a worker as it's a standard array. Just ensure to recreate the tensor from this array on the receiving end if needed for further processing. Happy coding! 😊

shimaamorsy commented 6 months ago

I want to serialize the object returned from inferenceSession.create not from inferenceSession.run

glenn-jocher commented 6 months ago

@shimaamorsy hey there! 😊 For serializing the object from inferenceSession.create, you could consider converting relevant session properties to a serializable format. Note that directly serializing the whole session object may not be straightforward due to its complex structure and potential references to non-serializable resources.

If you're looking to cache certain configurations or results, consider extracting and serializing only the necessary data. For instance:

const session = await ort.InferenceSession.create('path/to/model.onnx');
const sessionInfo = {
  inputNames: session.inputNames,
  outputNames: session.outputNames,
  // Any other properties you find necessary
};

const serializedSessionInfo = JSON.stringify(sessionInfo);

This way, you're dealing with simple objects and arrays, making serialization a breeze! 🌬️ Remember, operations like creating a session from this serialized info will still require you to load the model and any specific configurations manually. Happy coding!

github-actions[bot] commented 5 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

ultralytics / ultralytics