roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
MIT License
1.7k stars 176 forks source link

Faster than 1 second inference? #281

Closed jasonbarbee closed 1 year ago

jasonbarbee commented 3 years ago

First, thanks for publishing this project.

I'm trying to infer objects faster than 1 FPS. My camera is 20 FPS.

I have the motion detector down to 0.05 (1/20). Object detection interval down to 0.05. Looking at the debugs, I am guessing I get about 1-2, maybe 3 FPS send into the detector, it's difficult to tell as the viseron logs say the same message (objects detected []) was repeated X times, but it doesn't timestamp each detection log.

My Jetson Nano is idling around 20% cpu across the cores, and barely hitting the GPU (watching via jtop) - and I want to process more FPS for a realtime vehicle application.

I've read some docs and the previous issues. It seems like I need to decrease the motion detector, but when I decrease it below 0.05, I get divide by 0 errors in the code.

How can I increase the FPS pushed into the Object Detection engine and not hit divide by 0 errors?

Here's my config

cameras:
  - name: Camera
    host: 192.168.123.132
    port: 554
    username: <if auth is enabled>
    password: <if auth is enabled>
    path: /live.sdp
    width: 1920
    height: 1920
    fps: 20
motion_detection:
  interval: 0.05
  trigger_detector: false
  trigger_recorder: false
  timeout: true
  max_timeout: 30
  width: 416
  height: 416
  area: 0.1
  threshold: 1
  frames: 1

object_detection:
  type: darknet
  interval: 0.05
  log_all_objects: true

logging:
  level: debug
roflcoopter commented 3 years ago

Thanks for showing interest in Viseron!

You should not have to decrease the interval any lower, i suspect the bottleneck may be elsewhere. Hard to guess where tho.

Do you get faster detections if you swap the model for the yolov3-tiny version?

object_detection:
  type: darknet
  interval: 0.05
  model_path: /detectors/models/darknet/yolov3-tiny.weights 
  log_all_objects: true

Edit: Also the motion detector interval can be set to a higher number without affecting the object detector

roflcoopter commented 3 years ago

Googling a bit on using the Nano with YOLOv4 on OpenCV it seems that the FPS is generally quite low.

This post points towards around 2 FPS. https://forums.developer.nvidia.com/t/yolov4-with-opencv/158725

To utilize the Nano better it seems other tools and models need to be used. Is this something you have experience with?

jasonbarbee commented 3 years ago

Yeah, I do have some experience there, so I know the targets I want to hit from that experience. I made my own multithreaded threaded Python engine that uses the nvidia optimized gstreamer to feed from a RTSP h264 stream camera, run Darknet Yolov3-Tiny inference on realtime frames, have a listener for MQTT control and notifications, and that pushes captured images and objects down via MQTT. Many similarities with your project! I can get about 12 FPS detection out of Yolov3-tiny on the Nano with a custom trained 416x416 model.

If you look at the second response on the thread you posted, he confirms also getting 12FPS on Yolov4-Tiny on the Nano.

You can see the realtime cpu and cpu utilization view using jetsonstats https://github.com/rbonghi/jetson_stats All cpu cores hovers around 20%, and the GPU is barely ever touched.

I changed the model_path and model_config to tiny, but see the same results. It's about 1FPS, and the resources of the box are not tapped hardly at all. It just posts that Objects [] were found, about 1 message repeated per second.

Here's the traceback, after I change motion and detector intervals to 0.025 to try to get (2 out of every 20 frames inspected instead of just 1 - 0.05)

viseron | Exception in thread viseron.camera.cisco: viseron | Traceback (most recent call last): viseron | File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner viseron | self.run() viseron | File "/usr/local/lib/python3.8/threading.py", line 870, in run viseron | self._target(*self._args, **self._kwargs) viseron | File "/src/viseron/camera/init.py", line 114, in capture_pipe viseron | decoder.scan_frame(current_frame) viseron | File "/src/viseron/camera/frame_decoder.py", line 93, in scan_frame viseron | if self._frame_number % self._interval_fps == 0: viseron | ZeroDivisionError: integer division or modulo by zero

roflcoopter commented 3 years ago

Yeah, I do have some experience there, so I know the targets I want to hit from that experience. I made my own multithreaded threaded Python engine that uses the nvidia optimized gstreamer to feed from a RTSP h264 stream camera, run Darknet Yolov3-Tiny inference on realtime frames, have a listener for MQTT control and notifications, and that pushes captured images and objects down via MQTT. Many similarities with your project! I can get about 12 FPS detection out of Yolov3-tiny on the Nano with a custom trained 416x416 model.

That sounds awesome! Do you have your code posted anywhere? Would love to have a look.

Would be great to make a tailored solution for the Nano, but I dont own a Nano myself sadly so creating something like that is very hard for me on my own (took me ages to get it running on the Nano in the first place!) I have some work going on right now where im trying to make Viseron more modular, and also the interfacing with the cameras. Right now FFMPEG is the only possibility but i would like to be able to utilize, in this instance, gstreamer as you mentioned.

Here's the traceback, after I change motion and detector intervals to 0.025 to try to get (2 out of every 20 frames inspected instead of just 1 - 0.05)

viseron | Exception in thread viseron.camera.cisco: viseron | Traceback (most recent call last): viseron | File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner viseron | self.run() viseron | File "/usr/local/lib/python3.8/threading.py", line 870, in run viseron | self._target(*self._args, self._kwargs) viseron | File "/src/viseron/camera/init**.py", line 114, in capture_pipe viseron | decoder.scan_frame(current_frame) viseron | File "/src/viseron/camera/frame_decoder.py", line 93, in scan_frame viseron | if self._frame_number % self._interval_fps == 0: viseron | ZeroDivisionError: integer division or modulo by zero

interval: 0.05 should already be working at 20 FPS for you. interval is specified in seconds, so if you take 1/20 = 0.05 it will inspect every frame. However it doesnt seem like the current implementation can keep up with that.

ozett commented 2 years ago

That sounds awesome! Do you have your code posted anywhere? Would love to have a look.

@jasonbarbee Yes, please. lets have a look?

jasonbarbee commented 2 years ago

Update - I got permission to share the code, need a little time to test and write a readme - will update here soon when it's ready.

roflcoopter commented 1 year ago

Closing due to inactivity