YOLOv5 Object Detection Inference Terminates Early During Live Video Streaming from HoloLens 2

DanielDoe commented 1 year ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hello @ultralytics team,

I am currently working on a project involving live video streaming from a HoloLens 2 device for object detection using the YOLOv5 model. I am facing an issue where the object detection inference ends just a few seconds into the live stream. Here is my code sample:

python detect.py --weights yolov5x6.pt --source "https://{username}:{password}@192.168.110.176/api/holographic/stream/live_high.mp4?holo=true&pv=true&mic=true&loopback=true&RenderFromCamera=true",

I removed the username and password for safety reasons. The object detection runs and just terminates abruptly.

Output from the live stream:

Fusing layers... YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients 1/1: https://{username}:{password}@192.168.110.176/api/holographic/stream/live_med.mp4?holo=true&pv=true&mic=true&loopback=true&RenderFromCamera=true... Success (42 frames 854x480 at 30.00 FPS)

0: 384x640 1 bottle, 2 cups, 2 tvs, 1 laptop, 1 cell phone, 1 refrigerator, 268.7ms 0: 384x640 1 bottle, 2 cups, 2 tvs, 1 laptop, 1 keyboard, 1 cell phone, 1 refrigerator, 212.7ms 0: 384x640 1 bottle, 2 cups, 2 tvs, 1 laptop, 1 keyboard, 2 cell phones, 211.2ms 0: 384x640 1 bottle, 2 tvs, 1 laptop, 1 keyboard, 1 cell phone, 212.3ms Speed: 0.7ms pre-process, 226.2ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640) Results saved to runs/detect/exp15

Issue Description After successfully setting up and initiating the live video stream from HoloLens 2, YOLOv5 is able to detect objects for a very brief period, but it terminates prematurely within a few seconds. The termination of the inference occurs without any explicit error message.

Attempted Resolutions To resolve the issue, I have taken the following steps:

Ensured consistent and stable network connectivity.
Verified that the hardware has sufficient resources to run YOLOv5.
Checked for software errors or configuration issues. The same code works perfectly fine with local video files and webcam feeds.
Verified the input video stream for any potential issues such as incorrect format or corrupted frames. The stream is working fine with other applications.
Ensured that the YOLOv5 model is compatible and correctly loaded.
Despite these steps, the problem persists.

Request for Help Could you please guide me on what might be causing this early termination of object detection inference? I am particularly interested in knowing if there are specific settings or configurations needed for a smooth inference process with live video streams, or if there's a potential issue with handling video strides in the live stream scenario that I might be overlooking?

Any assistance in this regard would be greatly appreciated.

Thank you for your time and support.

Best Regards, Daniel

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @DanielDoe, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 1 year ago

Hello @DanielDoe,

Thank you for reaching out to us regarding your project with YOLOv5.

Based on the output you have provided, it may be possible that the inference is terminating early due to the live stream not providing enough frames for YOLOv5 to perform object detection. Since the terminated output does not provide us with a clear error message, it is hard to isolate the issue without further context.

To better assist you, could you please try to run the same command again, but this time with the flag --save-txt to save a text file of all detections. Then, see if this text file is produced properly with correct and full detections, or if it ends prematurely as well.

Also, if possible, please share the hardware specifications and the software environment in which you are running the code, as this may help us understand the issue better.

Thank you for your patience, and we look forward to your response.

Best regards, Ultralytics Team

DanielDoe commented 1 year ago

Running with the --save-txt: I have the following txt files

live_med_0.txt 63 0.0638173 0.38125 0.125293 0.325 62 0.282787 0.10625 0.56089 0.208333 0 0.327869 0.826042 0.185012 0.335417 62 0.800351 0.0833333 0.394614 0.1625 63 0.630562 0.625 0.556206 0.716667 live_med_1.txt 63 0.0222482 0.908333 0.0444965 0.175 63 0.0667447 0.3625 0.131148 0.320833 66 0.0216628 0.908333 0.0433255 0.175 0 0.328454 0.816667 0.190867 0.35 62 0.286885 0.0958333 0.571429 0.1875 62 0.80445 0.084375 0.388759 0.164583 63 0.635246 0.615625 0.556206 0.73125 live_med_2.txt 73 0.0685012 0.358333 0.13466 0.320833 63 0.0685012 0.355208 0.13466 0.322917 0 0.32904 0.815625 0.194379 0.35625 62 0.288642 0.09375 0.574941 0.183333 62 0.806206 0.08125 0.385246 0.158333 63 0.638173 0.6125 0.557377 0.7375 live_med_3.txt 73 0.0673302 0.360417 0.132318 0.320833 63 0.0667447 0.360417 0.131148 0.320833 0 0.328454 0.816667 0.193208 0.354167 62 0.286885 0.0947917 0.569087 0.185417 62 0.805035 0.0791667 0.387588 0.154167 63 0.637002 0.615625 0.559719 0.735417

Terminal output:

detect: weights=['yolov5x.pt'], source=https://{username}:{password}@192.168.110.176/api/holographic/stream/live_med.mp4?holo=true&pv=true&mic=true&loopback=true&RenderFromCamera=true, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=True, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 2023-5-23 Python-3.10.9 torch-2.0.1 CPU

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
1/1: https://{username}:{password}@192.168.110.176/api/holographic/stream/live_med.mp4?holo=true&pv=true&mic=true&loopback=true&RenderFromCamera=true...  Success (40 frames 854x480 at 30.00 FPS)

0: 384x640 1 person, 2 tvs, 2 laptops, 269.6ms
0: 384x640 1 person, 2 tvs, 3 laptops, 1 keyboard, 241.1ms
0: 384x640 1 person, 2 tvs, 2 laptops, 1 book, 239.2ms
0: 384x640 1 person, 2 tvs, 2 laptops, 1 book, 204.2ms
Speed: 0.6ms pre-process, 238.5ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp16
4 labels saved to runs/detect/exp16/labels

Device Specifications: Video output: Is from Microsoft Hololens 2 YOLO framework is run on: Apple M1 Pro 16GB Ventura 13.4 1TB Python 3.10.9 with pip 22.3.1

Objective: Is to have the object detection run as long as we want it to. I tried on a youtube video and it produced the desirable results. However on the video stream from the hololens, it just ends abruptly. Could it be a problem with the hololens video link ? Is there a way to increase the timeout window in case there is no video or a delay from the hololens stream to avoid termination? I am just thinking out loud. I tried azure iot edge for yolov3 and it works without fail but depends on azure iot hub. Also, streaming my video url on VLC works steamlessly. So what could be the problem, Hmmm?

Update 1: I just tried this stream on yolov8, it produced the same result, just runs for 4 frames and the. exited. Update 2: Upon research and experimentation, I believe due to internet connectivity, there is usually a 2-3s lag in the live stream from the hololens, which causes it to terminate after the fourth frame(usually), Do you have any suggestions for this problem?

glenn-jocher commented 1 year ago

Hello @DanielDoe,

Based on the results you have provided, it appears that the text files generated by running the command with the --save-txt flag are complete and contain multiple object detections. This indicates that the issue may be with the live stream itself, possibly due to a lag or connectivity issues.

In order to address the issue, you could try increasing the timeout window for the live stream to account for any delays or connectivity issues. You could also check the video link or try streaming the video on a different platform to see if the issue persists.

Another suggestion would be to monitor and measure the actual delay between the frames in the live stream by timestamping and comparing the frames, and adjusting the timeout window accordingly.

As for using Azure IoT Edge for YOLOv3, this may be an alternative solution if it works for your use case. However, YOLOv5 is a more advanced and faster object detection model that may better fit your needs.

Thank you for providing us with more information on your project and hardware/software environment. Please let us know if there is anything else we can do to assist you.

Best regards, Ultralytics Team

DanielDoe commented 1 year ago

Many thanks for your warm and quick resourceful response. I am gonna try this out in case anyone has similar problems, here is my thought process: Reducing latency involves optimizing several stages of the data transmission process, including:

Capture: Acquiring video data from the source.
Encoding: Compressing the video data to make it suitable for transmission.
Transmission: Sending the video data over the internet.
Decoding: Decompressing the video data so it can be displayed.
Display: Rendering the video data on a screen.

Here are a few ways to reduce latency at these stages using Python:
Use a low-latency video codec: Some video codecs are designed to minimize latency. Examples include H.264 and VP8. These codecs are often used in real-time applications like video conferencing.
Tweak encoding settings: Many video codecs allow you to tweak their settings to reduce latency. For example, you might be able to reduce the GOP size or disable B-frames.
Use a low-latency transmission protocol: Some transmission protocols are designed to minimize latency. Examples include RTMP and WebRTC. These protocols are often used in real-time applications like video conferencing.
Tweak buffering settings: By default, most video players buffer a certain amount of data before they start playing. This helps to ensure smooth playback, but it increases latency. You might be able to reduce latency by tweaking the buffering settings.

You can write an openCV code to realize this, I will publish my code for this later.

glenn-jocher commented 1 year ago

@DanielDoe hello,

Thank you for sharing your thoughts on reducing latency in the video streaming process. These are all valid options that can potentially help address the issue.

As for your suggestion to use OpenCV to implement these optimizations, that sounds like a great idea. We look forward to seeing your code and sharing it with the community.

Please let us know if you have any further questions or if there is anything else we can do to assist you.

Best regards, Ultralytics Team

bgpantojar commented 3 months ago

@DanielDoe

Hello Daniel,

I am trying out YOLOv5 with hololens 2 and got exactly the same problem. Did you manage to solve it?

Best, Bryan

DanielDoe commented 3 months ago

Hey this was really a pain in the ass to deal with for months but I figured writing an async function(e.g., when there are frames send and when you receive frames perform object detection else just wait and listen for transmission) sort of ameliorated the problem a little. Try it and let me know. Wish you all the best. But HoloLens needs to do better with their video streams. Sucks!

glenn-jocher commented 3 months ago

Hi Bryan,

Thank you for your input. Implementing an asynchronous function to handle frame transmission and detection can indeed help mitigate the issue. If you have any further questions or need additional assistance, feel free to ask.

bgpantojar commented 2 months ago

Thanks for the answers @DanielDoe and @glenn-jocher

I am trying to figure it out how to implement that function.

The frame fetching seems that is done in the LoadStreams class while the prediction in the run function. Could you please be more specific where did you implement such asynchronous function? by any chance could you share your implementation?

Thanks in advance, Bryan

glenn-jocher commented 2 months ago

Hi Bryan,

To implement an asynchronous function, you can modify the LoadStreams class to handle frame fetching asynchronously. Consider using Python's asyncio library to manage asynchronous tasks. Unfortunately, I can't share specific implementations, but this approach should help you get started. If you have further questions, feel free to ask.

ultralytics / yolov5