jakowenko / double-take

Unified UI and API for processing and training images for facial recognition.
https://hub.docker.com/r/jakowenko/double-take
MIT License
1.23k stars 98 forks source link

[FEAT] detect also non-face objects? #142

Closed Ocramius closed 3 years ago

Ocramius commented 3 years ago

Context

I just spent some time configuring double-take, and I must say that the DX of it acting as a middle-actor between MQTT, Frigate and complex recognition engines is quite refreshing.

My setup is quite simple: an IP Cam tracks objects passing by in a hallway.

While I managed to make this work with the faces of some subjects, the IP Camera is specifically placed, in respect of personal privacy, in such a way that faces are not visible (camera video feed does not include anything from torso up).

I attempted to train the model against photos of legs (people passing by - clothing kinda gives identity away, when well trained), pets running around the place (useful to know who "did it again"), etc.: no luck.

Then I realized that this tool is specifically designed to handle face recognition.

Question

  1. Is this the right layer of abstraction where such support would be introduced?
  2. Does the recognition of non-face objects depend on the backend engine, or is it configured by this tool?
  3. Should I instead do the image recognition myself (incurring in the complexity of having to talk to/train the backend directly)?
Ocramius commented 3 years ago

This looks related to this API endpoint being used:

https://github.com/jakowenko/double-take/blob/90b6fa2c7894bd784847fc36555612ed2c05aa9d/api/src/util/detectors/deepstack.js#L21

It looks like deepstack only supports face registration/recognition, while object recognition is pre-trained and very limited: https://docs.deepstack.cc/api-reference/index.html

The API of CompreFace is a bit more broad, but still centered around faces: https://github.com/exadel-inc/CompreFace/blob/6fd32a1182664ee369681c32aba1d2cdb4a2b5db/docs/Rest-API-description.md#managing-subject-examples

Perhaps it is possible to train subject data with non-faces on CompreFace, but it may be more of an uphill battle 🤔

jakowenko commented 3 years ago

Hey @Ocramius, thanks for checking out my project. Happy to try to answer these questions. What problem are you trying to solve that you can't currently?


1. Is this the right layer of abstraction where such support would be introduced?

You're right this project right now is centered around facial detection. That was the first use case I was initially trying to solve. I do want to add in support for object detection, but that is always going to relay on the underlying detectors used.

Frigate also does a wonderful job with object detection, have you tried that with your hallway camera to see if it detects the legs as people? Double Take uses the current Frigate images to perform the facial detection, so it wouldn't be a huge lift to pull in the other labeled images from Frigate to display their classification. This would be duplicating some of what can be done in Frigate currently though.


2. Does the recognition of non-face objects depend on the backend engine, or is it configured by this tool?

Yes, the other open source detectors that this tool interfaces with need to support object detection. DeepStack is probably the best one to start with if Frigate doesn't work out for you (I believe Frigate lets you train your own models too).

I could then add in support to use the other API endpoints from DeepStack to do the object detection if that is something you'd be interested in.


3. Should I instead do the image recognition myself (incurring in the complexity of having to talk to/train the backend directly)?

Depends on your knowledge of this and programming skills. If you can train a model that performs how you want with DeepStack, I can update Double Take to pass images to it.

Ocramius commented 3 years ago

Frigate also does a wonderful job with object detection, have you tried that with your hallway camera to see if it detects the legs as people?

It certainly does - the issue is now:

So we're indeed at the edge of "detection", and into "identification".

Double Take uses the current Frigate images to perform the facial detection, so it wouldn't be a huge lift to pull in the other labeled images from Frigate to display their classification. This would be duplicating some of what can be done in Frigate currently though.

Indeed, the edge of what I was attempting is identification after having detected the object.

(I believe Frigate lets you train your own models too).

That's something worth investigating.

Depends on your knowledge of this and programming skills. If you can train a model that performs how you want with DeepStack,

Programming - generally OK: I can suffer some Javascript :D

I can update Double Take to pass images to it.

I don't see a way for DeepStack to train object recognition - only face recognition? :thinking: That's kinda the deciding factor on whether this could live here, or if it's complete scope-creep by me :D

Alternatively, my approach could be:

jakowenko commented 3 years ago

I don't see a way for DeepStack to train object recognition - only face recognition? 🤔 That's kinda the deciding factor on whether this could live here, or if it's complete scope-creep by me :D

What about this https://docs.deepstack.cc/custom-models/? You'd have to make your own model, but this may be a solution.

Would it be worthwhile for me to add in TensorFlowJS support into Double Take? I haven't looked at it in a while, so I'm not sure what would be involved. But if you could create your own detector in double take and give it a custom model, I could then process Frigate images through that.

Ocramius commented 3 years ago

I don't see a way for DeepStack to train object recognition - only face recognition? thinking That's kinda the deciding factor on whether this could live here, or if it's complete scope-creep by me :D

What about this https://docs.deepstack.cc/custom-models/? You'd have to make your own model, but this may be a solution.

Yeah, custom models is probably the way then: the process is highly user-unfriendly though, and training will likely take hours instead of seconds :D (I think current approach is using transfer training?)

Given that this is a viable solution, before throwing this problem at Double Take, it is probably best for me to go down the rabbit hole alone, and see if anything useful can be created.

Would it be worthwhile for me to add in TensorFlowJS support into Double Take? I haven't looked at it in a while, so I'm not sure what would be involved. But if you could create your own detector in double take and give it a custom model, I could then process Frigate images through that.

As a backend? As far as I've seen, TensorFlowJS is really just a set of adapters itself, and even has typings for TypeScript.

I think the strength of Double Take (from what I could experience, and by the way, it is great) is:

If TensorFlowJS Could be embedded (no external system to be connected, no URI to be configured, etc.), then it would be a nice "out of the box" experience for users, but it's a big burden to maintain a whole training system (and its storage) here.

Ocramius commented 3 years ago

I'm going to close this issue for now: your time and patience is greatly appreciated, and I think I need to figure out if any of this is viable at all, before proposing that someone else should (lots of quotes here - it's open source!) take on huge amounts of potentially unfeasible work :D

I'll gladly contribute back if/when I get to it.

So far, all guidance provided points me to ideas on how this could be achieved, and the idea of including TensorFlowJS should be inspected elsewhere, if interesting to the tooling.

sstratoti commented 1 year ago

I actually created my own deepstack model for USPS logos, and was using that in combination with a 2nd deepstack model for other package delivery logos.

https://github.com/sstratoti/DeepStack_USPS

https://github.com/OlafenwaMoses/DeepStack_OpenLogo

So if I were to spin up double-take and point it at these model's end-points, deepstack should return back any logos found that match the model. Would double-take care that they're not faces? Or does double-take specifically point at the v1/vision/face end-point?