ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
24.46k stars 4.86k forks source link

The YOLOv8 segmentation model with batching option doesn't run on the GPU ? #12776

Open shimaamorsy opened 2 weeks ago

shimaamorsy commented 2 weeks ago

Search before asking

Question

When I tried to run yolov8-seg.onnx with the batching option activated, this error appeared ..

ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.490199 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf

ort-wasm-simd.jsep.js:54 2024-05-17 15:23:15.492300 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

ERROR_MESSAGE: Non-zero status code returned while running Softmax node. Name:'/model.22/dfl/Softmax' Status Message: Failed to run JSEP kernel

To Produce

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # load an official model
model = YOLO('path/to/best.pt')  # load a custom trained model

# Export the model
model.export(format='onnx' ,dynmic=True)

Here'is the mdel

Script

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
    <title>Yolov8 test</title>

    <!-- ONNX RUNTIME -->
    <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.webgpu.min.js"></script>
  </head>
  <body>
   <script>
    const modelName = "yolov8n-seg-batching.onnx";
    const modelInputShape = [1, 3, 640, 640];
    async function testModel() {

        let model =  await ort.InferenceSession.create(modelName,{ executionProviders: ["webgpu","cpu"] });
        const tensor = new ort.Tensor("float32",new Float32Array(modelInputShape.reduce((a, b) => a * b)),modelInputShape);
        await model.run({ images: tensor });
        console.log(model);
    }
    testModel();
    </script>
  </body>
</html>

Additional

No response

glenn-jocher commented 1 week ago

Hello! It looks like you're encountering an issue with running the YOLOv8 segmentation model with batching on a GPU using ONNX Runtime. The error message suggests that some operations are being assigned to the CPU instead of the GPU, which might be affecting performance.

Here are a couple of suggestions to potentially resolve this issue:

  1. Ensure GPU Support: Make sure that your ONNX Runtime installation supports GPU execution. You might need to install the GPU-specific package for ONNX Runtime if not already done.

  2. Execution Providers: In your script, you're specifying "webgpu" and "cpu" as execution providers. If you're aiming to run the model on an NVIDIA GPU, you should use "cuda" instead of "webgpu". Modify your script as follows:

    let model = await ort.InferenceSession.create(modelName, { executionProviders: ["cuda","cpu"] });
  3. Verbose Logging: As the warning suggests, enabling verbose logging might provide more insights into why certain nodes are not being assigned to the GPU. This can help in diagnosing the issue further.

  4. Dynamic Batching: You've mentioned using dynamic=True during export, which is great for flexibility in input sizes but can sometimes complicate execution provider optimizations. Ensure that your model and the ONNX Runtime version are compatible with dynamic batching.

If these steps don't resolve the issue, consider providing more details or the verbose logs as suggested by the error message. This could help in pinpointing the exact problem. Hope this helps! 😊