Have you ever thought about re-training the model...

neilyoung commented 3 years ago

... so that it concentrates on persons only and will not try to detect toothbrushes and umbrellas, when there will be no?

It could boost the inference rate...Especially if you deal with 3 USB cameras on a PI4 plus Coral, the inference rate is 10 fps per camera with this model. I know about Nvidia Jetson Nano achieving 30 fps per cam with and adapted model, only capable of detecting four classes or so...

wb666greene commented 3 years ago

No I have not really thought about it. I'm impressed at how well it detects people in images that have zero chance of having been in the training set.

Re-training it on images from my camera would likely make it less generally useful. I am looking into adding a Jetson Nano running YOLO4 as a final verification step and have a large collection of bogus detection images to test it with but I haven't yet had time to put it all together yet to see if it'd really work or not.

I did try and use a Posenet model as a second verification step. It was 100% at rejecting my false positives, problem was it was over 50% at rejecting true positives, approaching 100% for cameras with steep down-looking angles (common with security cameras in confined areas).

scottlamb commented 3 years ago

Passing comment:

I'd love to have some public, easy-to-contribute-to database of security camera training data. It'd hold images or video snippets captured on security cameras in a variety of conditions (ideally labelled as such):

night (IR or extreme low-light) or day
weather: sunny, overcast, raining/snowing, wind
lighting: back-lit, shadows
spiderwebs or flying insects in front of the camera
noisy images (I have varying quality cameras...)
different camera angles
indoor/outdoor
...

which feature a variety of people and detection conditions:

sizes (toddlers up to adults)
race/skin color
clothes (including masks these days...)
gaits
distance
...

as well as objects that have been falsely detected as people.

It also could include labeled animals/delivery trucks/etc if folks want them to be detected.

The Coral sample models were trained on COCO, which is a huge data set with high-quality labels, but it's not from security cameras, so it doesn't have many of the situations listed above. Results from these models are surprisingly good given this and the Coral folks' warning that "These are not production-quality models; they are for demonstration purposes only." But I think someone could get even better results by applying transfer learning to them with a security camera-focused dataset. My understanding is that transfer learning needs far fewer images and far less computing power than starting from scratch.

Some day if no one else does, I'll find the time to start such a database. I'd be really happy if someone beat me to it, though.

neilyoung commented 3 years ago

Transfer learning would be the key. Unfortunately not even Google's sample code produces something useful here https://coral.ai/docs/edgetpu/retrain-detection/#download-and-configure-the-training-data

scottlamb commented 3 years ago

If I had quality data to work from, I'm sure I could figure out the transfer learning thing.

wb666greene commented 3 years ago

I think the issue you will have is the lack of background variation in security camera images other than lighting. Specifically all the null images with no person will be too similar for any particular camera view. So you will need nearly as many cameras as images.

Its the "annotation" of the target object in the image that is tedious and time consuming to do.

Getting images is easy -- my 15 cameras have ~65000 "person detected" images in ~15 days (my retention time) using ~86GB of storage (7 of the cameras are 4K the others are 1080p). Ones without detection aren't saved, but it wouldn't be hard to add to the code. There is zero chance I could annotate any significant fraction of them in any reasonable time frame, not to mention it'd be mind numbingly boring to do.

"spiderwebs or flying insects in front of the camera" Interesting, this is my number one source of false detection, I can filter out many of them by rejecting any "person" that fills more than 10-20% of the image depending on the camera view, but can't get them all without creating false negatives. The second most common false positive is fixed objects (tree trunks, pool filter tank, bushes, etc.) that gives bursts of false positives typically for a few minutes as the sun rises/sets or the camera switches in/out of IR mode. These are "easy" to filter because they are in fixed locations in the frame. I have a blacklist of camera specific box points such that if the detection matches to with a tolerance it is ignored.

Here is a bug detection that made me reduce my "blob" threshold: 23_05_46 2_HummingbirdRight_Audio_AI alert

wb666greene / AI-Person-Detector

Have you ever thought about re-training the model... #9