ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
GNU Affero General Public License v3.0
23.76k stars 4.74k forks source link

video object detection #11665

Open wyfshr opened 1 week ago

wyfshr commented 1 week ago

Search before asking


When using YOLOv8 for video object detection, how to use the time context information of the video, is it necessary to change the input tensor, change (batch-size, channels, height, weight) to (batch-size, channels, time, height, weight), so where do you need to change in the code?

Use case

The temporal context information of the video is used to enhance the feature information contained in the current frame



Are you willing to submit a PR?

github-actions[bot] commented 1 week ago

👋 Hello @wyfshr, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.


Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics


YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):


Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 week ago

@wyfshr hello,

Thanks for reaching out with your query about using time context information in video object detection with YOLOv8.

Currently, YOLOv8 does not natively support the incorporation of time or sequence data (like (batch-size, channels, time, height, width)) directly into its models. The standard input format is (batch-size, channels, height, width) for individual image frames or video frames processed independently.

To include time context, you might consider approaches external to the YOLOv8 framework such as feature engineering or using sequence models (like LSTM or GRU) on top of the YOLO outputs to incorporate temporal dynamics. These adjustments would be in the post-processing stage after running detections with YOLOv8.

If you're willing to contribute, enhancing YOLOv8 to natively handle video sequences might involve significant modifications to how data is loaded and processed. If you're seriously considering this, I'd recommend discussing it further in the GitHub discussions to outline a robust plan.

We appreciate your willingness to contribute and look forward to any further questions or proposals you might have!

Bhavay-2001 commented 2 days ago

Hi @glenn-jocher, can I work on this issue? I would be happy to discuss it by opening a draft PR and then discussing possible suggestions. Thanks