Tensorflow JS | Real-time Object Detection with YOLO and Webcam

9christian9 commented 5 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

tensorflow==2.13.1

I'm trying to use YoloV8 for Real-time Object Detection in Tensorflow JS .

I used this repo for images already labeled. After cloning, I ran this command to train the model: yolo task=detect mode=train model=yolov8n.pt imgsz=640 data=data.yaml epochs=50 batch=20 name=yolov8n_custom

After training the model, I tested the model using the following command: yolo task=detect mode=predict model=best.pt conf=0.25 source='1.jpg'

Yolo correctly predicted the object in the passed image. After that, I want to export the best.pt to TFJS, I Used this command: yolo export model=best.pt format=tfjs

Now a folder is created with the following files:

group1-shard1of3.bin
group1-shard2of3.bin
group1-shard3of3.bin
metadata.yaml
model.json

I loaded the successfully the model using: tf.loadGraphModel(model.json)

I start the prediction when the video starts:

navigator.mediaDevices
    .getUserMedia({
      audio: false,
      video: {
        facingMode: "environment",
        width: 640,
        height: 640
      },
    })
    .then(stream => {
      video.srcObject = stream;
      video.onloadedmetadata = () => {
        video.addEventListener('loadeddata', predictWebcamTF);
      }
    });

I am using this to find the tensor in the image:

function predictWebcamTF() {
  detectTF(video).then(function () {
    window.requestAnimationFrame(predictWebcamTF);
  });
}

async function detectTF(imgToPredict) {
  await tf.nextFrame();
  const tensor = tf.browser
    .fromPixels(imgToPredict)
    .resizeNearestNeighbor([640, 640])
    .expandDims()
    .toFloat();
  const prediction = await model.execute(tensor);
}

The prediction does not return useful values to identify the object in the video but only:

{
    "kept": false,
    "isDisposedInternal": false,
    "shape": [
        1,
        5,
        8400
    ],
    "dtype": "float32",
    "size": 42000,
    "strides": [
        42000,
        8400
    ],
    "dataId": {},
    "id": 808,
    "rankType": "3",
    "scopeId": 9
}

Can someone help me understand where I am wrong? Thanks in advance, Christian.

Additional

No response

github-actions[bot] commented 5 months ago

👋 Hello @9christian9, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 5 months ago

@9christian9 hello Christian,

Great to see your progress with YOLOv8 and TensorFlow JS for real-time object detection! It looks like you're almost there. The issue seems to be with how the predictions are processed after running the model.

From the output you posted, it appears that the model predictions aren't being formatted or decoded into a more understandable structure, such as bounding boxes or class probabilities. Typically, the raw output from YOLO models will include such detailed data compacted into the tensor's dimensions, which need to be decoded post-prediction.

Here is a suggestion to process the output correctly:

Ensure that you're applying the appropriate post-processing to decode the tensor into meaningful prediction data. This may involve reshaping the output tensor and applying non-max suppression to filter out overlapping boxes based on their confidence scores.

You might need to adapt your detectTF function to include these steps. Unfortunately, without seeing the exact format of your model output tensor (prediction), I can't provide direct code. However, the general approach involves interpreting the logits, extracting bounding boxes coordinates, and class labels from them.

Here’s a basic template on what this might involve (you will need to adapt it based on your specific model output structure):

const boxes = decodeBoxes(prediction); // Implement decoding based on your model's output
const finalPredictions = applyNonMaxSuppression(boxes);
renderPredictions(finalPredictions, canvasContext);  // If displaying on a canvas

In terms of development, you may need to dive deeper into TensorFlow.js utilities for operations like non-max suppression, or potentially leverage existing utilities that handle YOLO model outputs specifically.

Keep up the great work! If you need further assistance, feel free to reach out.

plus1998 commented 5 months ago

I have same problem in node.js.

But I don't know anything about Tensor, and I look forward to a case that can be used in tfjs.

Code


const tf = require('@tensorflow/tfjs-node')
const fs = require('fs');
const path = require('path');

async function loadModel() { const model_path = path.join(__dirname, '../best_web_model/model.json'); const model = await tf.loadGraphModel('file://' + model_path) return model; }

async function main() { const model = await loadModel(); const img = fs.readFileSync(path.join(__dirname, '../test/captcha.png')); const tfimg = tf.node.decodePng(img); const resized = tf.image.resizeNearestNeighbor(tfimg, [640, 640]); const reshaped = resized.reshape([1, 640, 640, 4]); const channelsRemoved = reshaped.slice([0, 0, 0, 0], [1, 640, 640, 3]); const tensor = channelsRemoved.toFloat(); const ret = await model.predict(tensor); console.log(ret) }

main()


* Output

Tensor { kept: false, isDisposedInternal: false, shape: [ 1, 5, 8400 ], dtype: 'float32', size: 42000, strides: [ 42000, 8400 ], dataId: {}, id: 668, rankType: '3', scopeId: 8 }

9christian9 commented 5 months ago

Ciao @glenn-jocher,

I am closing this question because there is another related discussion that can be found here

Christian.

glenn-jocher commented 5 months ago

Hello @9christian9,

Thanks for the update, and for linking to the related discussion! 🙌

If you have any more questions in the future or if there's anything else we can help you with, feel free to reach out.

ultralytics / ultralytics