jakowenko / double-take

Unified UI and API for processing and training images for facial recognition.
https://hub.docker.com/r/jakowenko/double-take
MIT License
1.2k stars 94 forks source link

[BUG] Detector Issues #236

Closed LordNex closed 1 year ago

LordNex commented 2 years ago

Describe the bug I’m not sure if this is a bug or just the ability of one detector over another. But anyway, I have DoubleTake in Home Assistant along with CompreFace and Facebox. I also have a third detector that is DeepStack on a separate Jetson Nano. All 3 are listed as detectors for DoubleTake.

Most of the time, DeepStack and Facebox never show a proper detections even after training over 1000 images of each person through the cameras that will be used for detection. CompreFace running inside HA though is almost always right and is usually dead on when everything else is off.

Is this what most people see as far as detector quality or am I missing something. Also, CompreFace is running on a Dell PowerEdge Server without any good GPU. If CompreFace is just a better choice than using the rest. What would be the best way to use it? Leave it in HA using processor power (server has 40 cores), see if I can get it to install and work on the Jetson Nano or something that will leverage that hardware but work better than DeepStack, or get a GPU of some form and see if I can do a PCI pass through in VMWare for CompreFace and the PowerEdge to utilize?

Version of Double Take

v1.12.1-35874f3

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Hardware

Additional context Add any other context about the problem here.

LordNex commented 2 years ago

Which also raises a few other questions.

Is it better to have thousands of pictures of the same person from various sources and continue to train and correct but adding ones it tags incorrectly to the right person or would it be better to just use a handful of carefully taken headshots of each individual and never train anymore images?

Also, is it better to run just one detector or multiple ones? How does it determine which one is right and wins?

What is the planned implementation with the new Frigate+? Do we train images in both places or what's the best way of using the current integration chain for the best results?

Thanks again as always! I'll add more if any come to mind.

NickM-27 commented 2 years ago

I think 1000 is quite a lot of images, I personally have had greats results adding a few headshots and then I saved the first handful of frigate images (adding to 50 total) and have had good results.

In my experience CompreFace is far and away the best option, much more accurate than DeepStack. I think you may be misunderstanding how multiple detectors work, they all just are applied to the settings specified and they are not compared to each other. This is why, IMO, multiple detectors on the same camera doesn't make sense since they seem to give quite different scores and score ranges. CompreFace bad result is usually 30-40% meanwhile deepstack hasn't gone below 70% even for a totally different looking person.

It won't matter what device CompreFace is run on just whichever one has the most power although adding an nvidia GPU will help considerably.

What is the planned implementation with the new Frigate+? Do we train images in both places or what's the best way of using the current integration chain for the best results?

Frigate+ won't have training for specific people, just a generic face so there really is no overlap there. For users that use Frigate+ and DoubleTake, DoubleTake can be configured to use the face label instead of person so DoubleTake isn't running detections on images where the person's face is not visible from the cameras perspective.

LordNex commented 2 years ago

I think you may be misunderstanding how multiple detectors work, they all just are applied to the settings specified and they are not compared to each other.

So how does it determine a positive if you have multiple detectors. The first one that answers wins? Or whoever has the highest confidence? Or is it just triggered if any see a positive.

Another thing I've found is that different detectors are better for different things. DeepStack was good at grabbing partials of faces and still getting a good result, where as the other need a full on face.

I've since shut down my other detectors and am just running CompreFace inside of Home Assistant but the old trained images are still linked to the old detectors. I've tried to untrain all of the images For each person and then add back the ones I want and it just keeps pulling the full list in.

In addition it still looks like it's holding the images for the other detectors. My guess is the only way to remove all of that would be to delete the people and possible reinstall CompreFace on HA. Or maybe just a rebuild with DoubleStack in HA.

I've looked into running CompreFace on the Jetson but currently it's not been ported to ARM based chipsets. Although me and another guy are going to see if we can build a docker compose package that will run on arm. Until then it's just going to run on shared processing power. Luckily I have plenty, but this screams out for an edge device or something specific to do this function.

I've slowly started with Frigate+. My plan is to train it people and faces and have it trigger different automations depending on what it sees. Person is already in there and is what I have been using but that means a lot of images of backs and sides get sent to the detector for facial recognition when obviously it's impossible. So I'm going to train a separate model for "face" so that when Frigate detects that, it'll send just that via MQTT to DoubleTake for processing. "Person" I will likely just set to notify and record until a face is seen.

Only issue I've ran into this is that I use my iPhone 13 Pro Max for just about everything now days and the widget for cropping images and making models doesn't want to work on iOS. So I gotta clean off my clutter covered laptop and work it from there. But it would be interesting if we could combine several different SoC designs into one device that you could easily plug into your network and start training it from your existing cameras. I could see either 1 big Jetson, 2 Nanos, or 2 RPi4 with TPUs and a Nano in one case. Obviously reliant on either network attached storage or Locally connected storage.

NickM-27 commented 2 years ago

So how does it determine a positive if you have multiple detectors. The first one that answers wins? Or whoever has the highest confidence? Or is it just triggered if any see a positive.

Each detector has its own detection being run, they don't talk to each other. If any of them get a true positive then frigate will accept that for a sub label.

I've had good results with just compreface and different angles so maybe expand the training set to different angles.

LordNex commented 2 years ago

So how does it determine a positive if you have multiple detectors. The first one that answers wins? Or whoever has the highest confidence? Or is it just triggered if any see a positive.

Each detector has its own detection being run, they don't talk to each other. If any of them get a true positive then frigate will accept that for a sub label.

I've had good results with just compreface and different angles so maybe expand the training set to different angles.

Well I guess is all based upon the template you make for notifications. I'm using the DoubleTake example, the thumbnail shows up properly but if you click the Button for the URI fails.

Yea I like CompreFace way better than the other detectors. The addition of plug-ins a lone sets it apart. Except everyone in my family has long hair so it marks us all as female instead of long haired hippy LOL.

Now the trick is to try and get it to run on this Jetson Nano so it can utilize all of those CUDA cores. With DeepStack on there and working right I was getting results in the 150ms mark where CompreFace just on CPU takes 2-3 seconds. And that's after all the frigate processing and such. So if I can lower that back down to subsecond rates then we'll be in business.

LordNex commented 2 years ago

So how does it determine a positive if you have multiple detectors. The first one that answers wins? Or whoever has the highest confidence? Or is it just triggered if any see a positive.

Each detector has its own detection being run, they don't talk to each other. If any of them get a true positive then frigate will accept that for a sub label.

I've had good results with just compreface and different angles so maybe expand the training set to different angles.

Also the problem with this approach is that in some forms you might not be able to obtain proper angle shots. Sure that will work fine for family Members but I'd like to see it expand into being able to import either pretagged images such as made like PhotoPrysm or being able to import a database of faces and names like the sex abuse registry or all of Facebook. That way I know who's at my door even if I don't personally know them.

I wouldn't want to ask everyone I know to submit to a facial scan, but taking previously taken pictures and pictures taken from the security cams themselves would work.

That's the one option I'd like to see added to DoubleTake. To be able to connected to an outside link instead of just uploading picture after picture of each person I want recognized. I'm willing to put the power behind it if the code can keep up.

jakowenko commented 1 year ago

I've been a little busy with work lately, but catching back up. Sorry for the delay.

Thank you @NickM-27, you said exactly what I would have said. I'll also look into the face label from Frigate+ so Double Take can be a little smarter about when to process an image through a detector.

@LordNex For myself and the people I want to be able to recognize, I have around 30-40 pictures, mainly headshots. I've found taking selfies on a cell phone and using those work best. I found myself getting more false positives when I did train using images from my 1080p cameras, so I really try to not use those images unless I have no other choice.

Also, is it better to run just one detector or multiple ones? How does it determine which one is right and wins?

I started this project thinking it would be cool to be able to run multiple detectors at once, but quickly found that CompreFace worked the best for me and I didn't need to increase the complexity of my setup running two more services that provided lower quality results. The highest confidence value is taken to determine if the image is considered a match for a known face.

Members but I'd like to see it expand into being able to import either pretagged images such as made like PhotoPrysm or being able to import a database of faces and names like the sex abuse registry or all of Facebook. That way I know who's at my door even if I don't personally know them.

I'm definitely willing to consider better options for uploading and creating the training data sets.

I'm trying to cleanup some of these open issues. If you feel there is still a bug related to Double Take please reopen this. If you'd like to request a feature feel free to open a feature request.