Faster than 1 second inference?

jasonbarbee commented 3 years ago

First, thanks for publishing this project.

I'm trying to infer objects faster than 1 FPS. My camera is 20 FPS.

I have the motion detector down to 0.05 (1/20). Object detection interval down to 0.05. Looking at the debugs, I am guessing I get about 1-2, maybe 3 FPS send into the detector, it's difficult to tell as the viseron logs say the same message (objects detected []) was repeated X times, but it doesn't timestamp each detection log.

My Jetson Nano is idling around 20% cpu across the cores, and barely hitting the GPU (watching via jtop) - and I want to process more FPS for a realtime vehicle application.

I've read some docs and the previous issues. It seems like I need to decrease the motion detector, but when I decrease it below 0.05, I get divide by 0 errors in the code.

How can I increase the FPS pushed into the Object Detection engine and not hit divide by 0 errors?

Here's my config

cameras:
  - name: Camera
    host: 192.168.123.132
    port: 554
    username: <if auth is enabled>
    password: <if auth is enabled>
    path: /live.sdp
    width: 1920
    height: 1920
    fps: 20
motion_detection:
  interval: 0.05
  trigger_detector: false
  trigger_recorder: false
  timeout: true
  max_timeout: 30
  width: 416
  height: 416
  area: 0.1
  threshold: 1
  frames: 1

object_detection:
  type: darknet
  interval: 0.05
  log_all_objects: true

logging:
  level: debug

roflcoopter commented 3 years ago

Thanks for showing interest in Viseron!

You should not have to decrease the interval any lower, i suspect the bottleneck may be elsewhere. Hard to guess where tho.

Do you get faster detections if you swap the model for the yolov3-tiny version?

object_detection:
  type: darknet
  interval: 0.05
  model_path: /detectors/models/darknet/yolov3-tiny.weights 
  log_all_objects: true

Edit: Also the motion detector interval can be set to a higher number without affecting the object detector

roflcoopter commented 3 years ago

Googling a bit on using the Nano with YOLOv4 on OpenCV it seems that the FPS is generally quite low.

This post points towards around 2 FPS. https://forums.developer.nvidia.com/t/yolov4-with-opencv/158725

To utilize the Nano better it seems other tools and models need to be used. Is this something you have experience with?

jasonbarbee commented 3 years ago

Yeah, I do have some experience there, so I know the targets I want to hit from that experience. I made my own multithreaded threaded Python engine that uses the nvidia optimized gstreamer to feed from a RTSP h264 stream camera, run Darknet Yolov3-Tiny inference on realtime frames, have a listener for MQTT control and notifications, and that pushes captured images and objects down via MQTT. Many similarities with your project! I can get about 12 FPS detection out of Yolov3-tiny on the Nano with a custom trained 416x416 model.

If you look at the second response on the thread you posted, he confirms also getting 12FPS on Yolov4-Tiny on the Nano.

You can see the realtime cpu and cpu utilization view using jetsonstats https://github.com/rbonghi/jetson_stats All cpu cores hovers around 20%, and the GPU is barely ever touched.

I changed the model_path and model_config to tiny, but see the same results. It's about 1FPS, and the resources of the box are not tapped hardly at all. It just posts that Objects [] were found, about 1 message repeated per second.

Here's the traceback, after I change motion and detector intervals to 0.025 to try to get (2 out of every 20 frames inspected instead of just 1 - 0.05)

viseron | Exception in thread viseron.camera.cisco: viseron | Traceback (most recent call last): viseron | File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner viseron | self.run() viseron | File "/usr/local/lib/python3.8/threading.py", line 870, in run viseron | self._target(*self._args, **self._kwargs) viseron | File "/src/viseron/camera/init.py", line 114, in capture_pipe viseron | decoder.scan_frame(current_frame) viseron | File "/src/viseron/camera/frame_decoder.py", line 93, in scan_frame viseron | if self._frame_number % self._interval_fps == 0: viseron | ZeroDivisionError: integer division or modulo by zero

roflcoopter commented 3 years ago

Yeah, I do have some experience there, so I know the targets I want to hit from that experience. I made my own multithreaded threaded Python engine that uses the nvidia optimized gstreamer to feed from a RTSP h264 stream camera, run Darknet Yolov3-Tiny inference on realtime frames, have a listener for MQTT control and notifications, and that pushes captured images and objects down via MQTT. Many similarities with your project! I can get about 12 FPS detection out of Yolov3-tiny on the Nano with a custom trained 416x416 model.

That sounds awesome! Do you have your code posted anywhere? Would love to have a look.

Would be great to make a tailored solution for the Nano, but I dont own a Nano myself sadly so creating something like that is very hard for me on my own (took me ages to get it running on the Nano in the first place!) I have some work going on right now where im trying to make Viseron more modular, and also the interfacing with the cameras. Right now FFMPEG is the only possibility but i would like to be able to utilize, in this instance, gstreamer as you mentioned.

Here's the traceback, after I change motion and detector intervals to 0.025 to try to get (2 out of every 20 frames inspected instead of just 1 - 0.05)

viseron | Exception in thread viseron.camera.cisco: viseron | Traceback (most recent call last): viseron | File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner viseron | self.run() viseron | File "/usr/local/lib/python3.8/threading.py", line 870, in run viseron | self._target(*self._args, self._kwargs) viseron | File "/src/viseron/camera/init**.py", line 114, in capture_pipe viseron | decoder.scan_frame(current_frame) viseron | File "/src/viseron/camera/frame_decoder.py", line 93, in scan_frame viseron | if self._frame_number % self._interval_fps == 0: viseron | ZeroDivisionError: integer division or modulo by zero

interval: 0.05 should already be working at 20 FPS for you. interval is specified in seconds, so if you take 1/20 = 0.05 it will inspect every frame. However it doesnt seem like the current implementation can keep up with that.

ozett commented 2 years ago

That sounds awesome! Do you have your code posted anywhere? Would love to have a look.

@jasonbarbee Yes, please. lets have a look?

jasonbarbee commented 2 years ago

Update - I got permission to share the code, need a little time to test and write a readme - will update here soon when it's ready.

roflcoopter commented 1 year ago

Closing due to inactivity

roflcoopter / viseron

Faster than 1 second inference? #281