nextcloud / recognize

👁 👂 Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres
https://apps.nextcloud.com/apps/recognize
GNU Affero General Public License v3.0
562 stars 45 forks source link

Idea: use dates and metadata for better clustering #566

Open MayeulC opened 1 year ago

MayeulC commented 1 year ago

Describe the feature you'd like to request

Disclaimer: I haven't read the code; this is mostly based on the behind-the-scenes wiki page and my experience with clustering.

Given that a person cannot be in two places simultaneously, it would be interesting to assume that people close in space and time are more likely to appear than others.

This could possibly lead to runaway situations, but the following information could be used:

Describe the solution you'd like

At the very least, pictures that are very close in time (and possibly metadata) can be reasonably assumed to be of the same scene for the initial guess (it's very common to take multiple pictures successively for group pictures).

Bayesian probabilities can help here: to identify the probability that X appears in a picture where Y appears: P(X|Y) = P(Y|X)*P(X)/P(Y). Of course, those probabilities on the right are estimates, and this may become a runaway feedback loop if not checked against the estimated P(X) as inferred by the network.

The same Bayesian formula can be used by replacing Y with other metadata: person taking the picture, date/time bins, location.

In the worst case, more clusters would be created, but those are generally easier to deal with than false positives, in my experience. And with any luck, one picture will get a good match with the reference, allowing the cluster to be combined automatically.

Describe alternatives you've considered

I made a second observation (and can open a distinct issue if you prefer): Groups of friends tend to cluster together, and an account owner is likely to attend events with distinct groups.

It may make sense to try to extract "super-groups" by grouping people who appear together in pictures, and/or close in time/space as per the metadata. If a few good matches are found for new pictures, it would make sense to first look among the previous supergroups, or have a heavier bias towards them.

markuman commented 1 year ago

Disclaimer: I haven't read the code; this is mostly based on the behind-the-scenes wiki page and my experience with clustering.

Me too. And I don't use the people/clustering feature yet.
But just a few words about my files/fotos

Date and time

Date/Time of my go pro video/images are always wrong :)

Camera identifier (hash together multiple identifiers) File naming convention can help with identifying the camera: I know apple devices name files differently, so do a few cameras. There aren't that many formats.

Imo, the file name convention is the same across all smartphones nowadays....
But exif information lead to the smartphone/camera type.

Location data

What about orientation/gyroscope data? But that would be a niche ...

MayeulC commented 1 year ago

Date/Time of my go pro video/images are always wrong :)

But then you more or less know that they are from your gopro; and they should be more or less right relatively speaking (in the same session, telling that two pictures were taken successively. That's one more metadata source.

file name convention is the same across all smartphones nowadays

It depends on the camera app. Google pixels use PXL, some phones use ANDRO others use DCIM, mine uses IMG, others use the date directly. And apple uses UUIDs when exporting pictures. So I disagree, and it's a mess. Apple devices seem to strip exif metadata when exporting jpeg, which is what iphone users seem to do when uploading their images to my server.

markuman commented 1 year ago

Google pixels use PXL, some phones use ANDRO others use DCIM, mine uses IMG ... So I disagree, and it's a mess.

Ah okey. Maybe I am trapped in my OpenSource/Fairephone bubble for to long