Just a question about Queue per Camera?

ozett commented 3 years ago

Hi, i am coming back to your queue-architecture after running yolo3 and person-detection for a year on my system and yesterday updated my ubuntu install with a fresh OpcenCV and yolo4.

in a simple test-loop yolov4 runs with a static image only 2 FPS and for now only on my CPU. so with that static image that seems to be the actual speedlimit of yolov4 on my CPU on the virtual Host.

i remembered your queue-architecture and looked again through the code, Now as i am a hobbyist i was wondering if its right, that you have separate single-threads for RTSP grabbing. And they all are pushing images into ONE global queue, and all the inference models get frames from there ? Or each cam fills its own image-queue, but how is the inference feed from multi queues?

if all rtsp threads feed up one global queue, how do you prevent from filling up in short time? i assume, that the rtsp-threads grab with 20 FPS, and also lets assume, that the inference only runs with 2 FPS, that means, the queue will fill up quickly because only one? inference thread can never empy the queue.

or am i missing something?

I would be happy to hear something about it, otherwise great code, regards

ozett commented 3 years ago

yolov4 on jeston nano 1 FPS.... -> https://jkjung-avt.github.io/yolov4/ 🤔

wb666greene commented 3 years ago

Each camera writes to its own queue with one thread per camera. I actually keep each camera queue short (depth about equal to the frame rate, I set the cameras to typically 3-5 fps) to reduce latency. The "video" perspective that dropped frames are bad doesn't really apply here.

The AI thread is passed an array (list) of the camera queues and reads them in sequence to sample each camera "round-robin".

The common queue is for the inference output. The "trick" here is the images without detection aren't very interesting so if there is no detection and no space in the output queue, the image is quickly dropped.

The main thing you are missing is that in the rtsp thread if the queue is full, the oldest image is pulled out and discarded to make room for the current frame which combined with the short queue depth keeps the images that are in the queue to be processed as close to the real-time current image as possible.

HTH.

As to the second comment about yolo4 on the nano, there was something wrong with his camera/mp4 reading, he is now getting ~4.5 fps with yolo4-416 on the Jetson Nano. I recently setup a system on my Nano following his instructions and have verified the results: https://github.com/jkjung-avt/tensorrt_demos#demo-5-yolov4

My next step is to use it on my collection of bogus detection images to see if it can reject them without rejecting real detection. Straight forward to do, but I just haven't been able to find a block of time to put it all together.

ozett commented 3 years ago

wow, thank your very much, all very helpful information. so now i will re-thing my designplans.

maybe i will test Yolov4-FPS on a nvidia-GPU and see how much FPS for inference with a static image is possible. lets say it comes to 8 FPS for inference so i could run 8 cameras on a 1 FPS infrerence-looping. or 4 cams with inference-speed of 2 FPS...

after that i can re-thing how to use your interesting round-robin RTSP-queue-design with last-dropped frames as a source for the inference. or for a distributed inference., maybe 🤔

thanks again, really appreciated. ozett

ozett commented 3 years ago

sure you know, ( but if not)... yesterday i found https://google.github.io/mediapipe/ on the web.

looked like ⚡ ultrafast ⚡ tensorflow-lite , also for face-dec...

i will play with it, as it could be a fast solution for human-detection (as face or pose).. will see how much FPS i will get in comparison to yolov4..

wb666greene commented 3 years ago

My experience with pose estimation and security cameras is that it doesn't work very well for the "down-looking" camera angles generally used with security cameras.

Face detection might be a nice way to maybe send the "best" image of a burst as a second step, but at the expense of latency. Also, experienced criminals are pretty good at shielding their faces from camera views, so I don't really see it adding anything for my use case.

With a low latency high definition image you can make the friend (mailman, dog walker too close to the house, political solicitor etc.) or foe decision and quickly take appropriate action.

But I'd definitely be interested in your experience with it.

I'm starting with converting my TPU code to use the "new" PyCoral libraries that have obsoleted the edgetpu libraries that I used initially. I hate it when they do that.

ozett commented 3 years ago

experimented with some nodes and some python-code with mediapipe (tensorflow-lite) and had good recognition of faces with face and facemesh. ( i used some of your ideas to push bas64 encoded images as json through std-out directy from pythonscript onto the node-red gui, thanks again for your github)

combining both let me find faces in mostly all situations (side, front, ..).

The pose-estimation works only good for certain distances from the camera (seems 1-4 meter) and from front. still looking how to make it usable for my purpose.

object-detection is missing python examples.

🍰 👍 object-tracking seems very interesting in the comparison sample ath the end: https://developers.googleblog.com/2019/12/object-detection-and-tracking-using-mediapipe.html but also missing some python-examples... maybe anybody has a linkt to start ??

for now, but fiddling goes on...

---edit: found times for inference for comparison (uses google-coral): https://blakeblackshear.github.io/frigate/hardware also interesting for a way to find fastest detection architecture..

false-positive-rates-table: https://community.home-assistant.io/t/local-realtime-person-detection-for-rtsp-cameras/103107/2740

ozett commented 3 years ago

fiddling now with pose-detection in mediapipe. not sure if it could help with general person=object-detection. detection depends on special distance to camera. analysis of landmarks needs some tuning... but the AI model is impressive with its detection of "virtual" body-features, means hidden arms or legs or the face from the back

wb666greene commented 3 years ago

I had some early encouraging results with posenet and a 4K security camera: https://youtu.be/XiUfChOAPAE but it didn't pan out for reasons I've already mentioned.

ozett commented 3 years ago

looks good, even the dogs skeleton was analyzed. 😄 i will think now to find ways how to interpret the pose-data and make use of that parts detection for some kind of alerting..

ozett commented 2 years ago

very interesting project for implementing pose-net https://blog.ambianic.ai/2021/09/02/movenet-vs-posenet-person-fall-detection.html

in the meantime i optimized frigate and got good results in inference-times and detection. https://github.com/blakeblackshear/frigate

and added face-rec with dtake https://github.com/jakowenko/double-take

i guess you already investigeated their code, didnt you? 😄

wb666greene / AI-Person-Detector

Just a question about Queue per Camera? #10