Project discussion - Githubissues

robmarkcole commented 5 years ago

Subject: Discussion of project I wish to write up. Project: Get alerts when someone is at my front door, similar to this project but with a couple of significant improvements.

(1) Works with SIDE PROFILE. (2) Works with ONVIF/RTSP camera

From discussion with @OlafenwaMoses (1) is straightforward so long as side profile images are used in training, TBD (2)

Nice to have (not a requirement )- works with Movidius/Coral USB stick

OlafenwaMoses commented 5 years ago

Great. Looking forward to this.

robmarkcole commented 5 years ago

@OlafenwaMoses can you comment on whether it is possible for Deepstack to ingest images from an ONVIF/RTSP stream?

OlafenwaMoses commented 5 years ago

Yes it is. If you use a camera/computer vision library (e.g OpenCV ) to obtain the image frames from the stream, all you need is to send the obtained image frames to DeepStack's APIs.

Below is a link to a stackoverflow discussion showing how to read RSTP video stream using OpenCV

https://stackoverflow.com/questions/20891936/rtsp-stream-and-opencv-python

robmarkcole commented 5 years ago

@OlafenwaMoses yes the openCV approach works, thanks!

TBD: I am (considering) expanding the scope of this project so as to learn something new and really show of the strengths of deepstack as production quality processing. 2 options I am weighing up

1) queue processing of images. There are a couple of options, e.g. following this or this other example. -> When a target object is detected the data will be sent to some endpoint.

2) Live stream processing -> goal is a live stream with processing outputs overlaid on the stream. Probably not possible without a GPU?

any thoughts?

OlafenwaMoses commented 5 years ago

It will be a nice idea to have the project show DeepStack in a production environment. To ensure a better experience for the readers, I am suggesting that it can be a mini-series or 2-part collection of articles. The initial article can be a first entry project while the follow-up will be a full-fledged production setup.

For 1, I think it will be a good idea. It can be an alert system that sends data/notifications or perform action if object(s) are detected.

For 2, real-time object detection on a live-stream will require the GPU version, and also probably a starting mode of "Low" i.e -e MODE=Low, depending on the GPU specs.

OlafenwaMoses commented 5 years ago

Also an intruder detection system that triggers email or sound alerts when persons, vehicles or animals are detected can be a good project to write on, considering it can be very useful in warehouses, farms and restricted/monitored areas. A single-frame/per second detection will be enough to effectively perform the task.

robmarkcole commented 5 years ago

Thanks for the suggestions, I think a 2 part article (or pair) is the way to go then.

First article will be basic alarm monitor, with motion triggered capture of a small number of images which can be easily batch processed.

Second article will be live stream, I don't have a GPU so this will require movidius/coral hardware. This second article will show how processing the live stream is both feasible on low cost hardware, and results in a more robust security system - with no need to fine tune motion capture settings and no missed detections e.g. due to latency effects

OlafenwaMoses commented 5 years ago

Alright . This is great. Thumbs up for the brilliant thinking on this. I will let you know once we have updates on the version for movidius/coral hardware.

robmarkcole commented 5 years ago

@OlafenwaMoses re article one, thinking some more it is in danger of being too similar to my earlier article https://www.hackster.io/97766/announce-who-is-home-using-facial-recognition-dcc389 I want to clearly demonstrate how we can build a system which has advantages over my previous system, e.g. easier to setup & maintain, better performance, lower cost etc. My previous article used Machinebox, so at the first instance can you comment on which features of Deepstack would make us want to choose it over the Machinebox offering for this use case?

OlafenwaMoses commented 5 years ago

Thanks for the comment. Let me highlights the benefits of our system compared to that of MachineBox has demonstrated in your previous article. 1) DeepStack's Face Recognition API only requires registering the faces once. 2) DeepStack only needs to be activated once. 3) DeepStack single install provides many APIs that can be used together with the Face API. 4) DeepStack runs on hardware with GPU acceleration, allowing faster and real-time Face Recognition without skipping video frames. 5) DeepStack provides endpoint security via API and Admin Keys for authentication. 6) DeepStack provides data backup and restore functionalities.

You can see all the details about these features in the Python Documentation linked below.

https://python.deepstack.cc/

robmarkcole commented 5 years ago

I think the other big differentiator is:

Deepstack has a propper object detection model (Machinebox do 'logos' using openCV I think)
Deepstack allows custom models

To update you on my progress, I have setup a motion triggered camera to gather training data, this is harder to fine tune than you would think! Quite a few images are of random motion, or miss the person due to latency.

OlafenwaMoses commented 5 years ago

Thanks for adding those 2 facts on DeepStack. DeepStack supports detection of 80 most common objects in offices, homes and public spaces. The list of objects detectable are in the webpage linked below.

https://python.deepstack.cc/object-detection

I understand that getting training data can be very challenging, especially in special use cases. It can be hard to find sufficient number of images to train an fairly accurate model. This problem is something we are looking at critically and we are hoping to build platforms to address it in the nearest future.

robmarkcole commented 5 years ago

@OlafenwaMoses did you see my Discord message re mp4 endpoint? Motivation for video clips is that just taking a single image on motion I often don't catch a view of the person. Example below where we catch the back of them. Ideally I capture a 10 second clip, then use Deepstack to both identify the person and return the best frame (i.e. clear face shot)

OlafenwaMoses commented 5 years ago

Yes I saw the message. I do understand the convenience and efficiency in processing the video across a time-frame on a single request. This is something we have reviewed before. We will definitely work on this and let you know once it is implemented.

robmarkcole commented 5 years ago

OK in the meantime I can crack on with the first article, will try get this out before I go on holiday in a couple of weeks but no promises :-)

robmarkcole commented 5 years ago

Moving conversation to the dedicated repo

robmarkcole / HASS-Deepstack-object

Project discussion #15