nextcloud / recognize

๐Ÿ‘ ๐Ÿ‘‚ Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres
https://apps.nextcloud.com/apps/recognize
GNU Affero General Public License v3.0
539 stars 45 forks source link

Faces recognised but not shown in People #662

Closed IndrekHaav closed 1 year ago

IndrekHaav commented 1 year ago

Which version of recognize are you using?

3.3.6

Enabled Modes

Face recognition

TensorFlow mode

Normal mode

Which Nextcloud version do you have installed?

25.0.3

Which Operating system do you have installed?

Debian 11

Which Docker container are you using to run Nextcloud? (if applicable)

N/A

How much RAM does your server have?

4G

What processor Architecture does your CPU have?

x86_64

Describe the Bug

I have about 200 photos uploaded to my Nextcloud instance, most of those containing faces. Recognize has processed them all, but only a single person shows up in the People section (of both Photos and Memories apps), with 12 photos.

I don't think this is the same as #588. I have checked the database, and the oc_recognize_face_detections table has 237 records, while oc_recognize_face_clusters only has one. Furthermore, if I manually insert a record into the clusters table and then link a record in the detections table to it, it shows up as a new person. I don't understand why the majority of the detected faces are not added to a cluster.

The same photos, when imported into PhotoPrism (which also uses Tensorflow) resulted in every single detected face showing up.

Expected Behavior

All detected faces should appear in the People section. As individual clusters, if needed, so they can be merged manually.

To Reproduce

Automatic run with standard settings. Results might depend on content, but most of the photos I have uploaded are high-quality.

Debug log

No response

marcelklehr commented 1 year ago

That would indicate problems with the clustering algorithm. It sounds like it doesn't run at all, though, because it should create at least some clusters, IMO.

IndrekHaav commented 1 year ago

Like I said, it did create one cluster, with 12 faces. Those were all the same person, too, so what work it did, it did correctly.

But just to clarify, the clustering algorithm should be able to create clusters with even a single face? There's no minimum number of faces that need to match before a cluster is created?

Edit: is there any way to run just the clustering job from the command line? With verbosity turned up?

marcelklehr commented 1 year ago

here's no minimum number of faces that need to match before a cluster is created?

There is a minimum. It's currently at 6 detections.

is there any way to run just the clustering job from the command line?

not at the moment, you can manually create an entry in oc_jobs in your database, though, for \OCA\Recgonize\BackgroundJobs\ClusterFacesJob with argument {"userId": "<your user name here>"}

IndrekHaav commented 1 year ago

There is a minimum. It's currently at 6 detections.

I see. That might explain why faces from my photos aren't showing up. Can I ask why this minimum? And is there any way to change it?

marcelklehr commented 1 year ago

@IndrekHaav The clustering algorithm needs a few hyperparameters to work well. From testing it transpired that setting a minimum cluster size improves clustering because it prevents accidental face matches to agglomerate into larger clusters that don't represent a single person (Something I like to call shit clusters). I recommend trying out v3.5.0 first to see if that improves the situation for you, since we're shipping a new clustering algorithm with that release. If that doesn't help and you're adventurous, you can change the min cluster size constant here:

https://github.com/nextcloud/recognize/blob/015f3c888af7f002b199c2414489769294a5fde6/lib/Service/FaceClusterAnalyzer.php#L20

(We've reduced the value from 6 to 5 in v3.5.0 now)

In v3.5.0 there are now convenience occ command for resetting clustering and running clustering manually: occ recognize:reset-face-clusters and occ recognize:cluster-faces

IndrekHaav commented 1 year ago

Thanks for the response!

I tried the new algorithm in 3.5.0. Just to be safe, I wiped all detected faces and clusters from the DB and triggered a full re-crawl.

This time, it created a few more people, but the vast majority of faces were put into a single cluster, seemingly almost randomly. I retried this a few times and also reran the recognize:cluster-faces command a couple of times; the exact numbers varied a bit, but I always ended up with 5-6 clusters, one of them with 80-90 photos of at least a dozen different people (male and female, from infants to 80-year-olds).

In other words, a shit cluster.

Having incorrect faces in the cluster wouldn't be so bad as they can be removed in the UI, but the problem is that there's no way (at least that I could see, in the Photos or Memories apps) to move photos from one person to a new person. One can only move them to an existing person, or remove them from the cluster completely. I tried the latter, but subsequent recognize:cluster-faces commands only put them back (I was hoping they would at least be put into new clusters).

I tried changing MIN_CLUSTER_SIZE, and found that lower values seemed to work better. I set it to 2, and that produced 13 clusters, with overall much better distribution of faces between clusters. That makes sense to me - some people appear in only very few photos, and a higher minimum cluster size might force the algorithm to incorrectly combine them with other faces. I tried setting it to 1, but that produced an error when running the clustering command. I would have liked to see the result, though - even if it errs too far in the other direction, I would rather have more clusters that I have to merge, than fewer clusters that I cannot effectively split.

While I'm messing with the code, is there another constant or parameter that determines how similar the faces have to be to get clustered together?

marcelklehr commented 1 year ago

This time, it created a few more people, but the vast majority of faces were put into a single cluster, seemingly almost randomly.

How many images do you have?

tried the latter, but subsequent recognize:cluster-faces commands only put them back

that would be a bug

I would rather have more clusters that I have to merge

yeah, that makes sense

While I'm messing with the code, is there another constant or parameter that determines how similar the faces have to be to get clustered together?

There is no constant value that governs this. HDBSCAN is an adaptive algorithm that learns from the density patterns of the data. The more files you have, the better the outcome. You could also try playing with MIN_SAMPLE_SIZE.

EVOTk commented 1 year ago

Hello, I don't know if this has anything to do with this ticket, but since the update to 3.5.0

I don't have any "peoples" in Photos. and the occ recognize:cluster-faces command is broken: image

I had more than 2500 detections

OMV6 Docker linuxserver/docker-nextcloud Nextcloud 25.0.3 Php 8.0.25

Oh, it looks more like this problem https://github.com/nextcloud/recognize/issues/676

IndrekHaav commented 1 year ago

How many images do you have?

  1. This is just a test set, I'm evaluating Memories + Recognize as a replacement for Photoprism.

I did try setting MIN_SAMPLE_SIZE to 2 as well. That also seemed to improve things a bit - created a few more clusters, and incorrectly grouped faces were at least more reasonable (visually similar, same gender and age group, etc).

MB-Finski commented 1 year ago

How many images do you have?

229. This is just a test set, I'm evaluating Memories + Recognize as a replacement for Photoprism.

I did try setting MIN_SAMPLE_SIZE to 2 as well. That also seemed to improve things a bit - created a few more clusters, and incorrectly grouped faces were at least more reasonable (visually similar, same gender and age group, etc).

Like @marcelklehr already commented, HDBSCAN, as it is implemented, will try to find the most stable clusters from the data regardless of their size. If the data contains a large number of identities (especially multiple identities with fewer face detections than MIN_CLUSTER_SIZE) it'll still try to find the most stable clusters in the data. This can lead to combining multiple identities of similar looking persons. The easiest way to alleviate this issue is to scan in a larger dataset.

Mind you, "similar looking" to the face recognition model may not always be similar looking to you or me. This is especially true in the case of children/infants. The dlib face recognition model hasn't been trained with images of children so they will cause trouble with clustering regardless of the clustering algorithm. (IIRC, Photoprism, for example, had a community effort to retrain their recognition model with datasets containing images of children.)

MIN_SAMPLE_SIZE is basically a probability density (i.e. "face detection density") smoothing factor used by HDBSCAN. Reducing this value too much can lead to statistical noise causing issues with larger datasets. Still, it might be that we'll have to reduce this value going forward; the optimal value in my test dataset may not be optimal for all users. Also, the optimal value will depend, to some degree, on how incremental clustering is implemented as that affects the amount of noise in the data that is being clustered.

@marcelklehr : Besides fine tuning MIN_SAMPLE_SIZE, another way to improve the clustering in this case might be to implement a limit on the maximum size of a cluster. The obvious, but likely(?) not optimal, solution would be to limit the radius of a face cluster. However, limiting the maximum edge length within a cluster (this was implemented in a previous version of the MstClusterer-class but I stripped it since it was not used) may be a better solution since this will specifically limit forming clusters in sparse areas of the face embedding space (since the mutual reachability distance will also be large in these areas). If the latter is implemented, it may help us get away with a larger MIN_SAMPLE_SIZE which is better for users with larger datasets. It may also be that a combination of both of these limits would provide the best user experience.

IndrekHaav commented 1 year ago

@MB-Finski Thanks for the info, that's an interesting read!

However, coming back to the original issue - irrespective of the way the clustering algorithm works, I think there should be a way for the user to review recognised but unclusters faces and, for each one, choose between "not a person" (don't suggest again), "merge with __" (pick existing cluster) or "new person" (create new cluster).

Or is that something that should be handled by another app like Photos or Memories?

ced455 commented 1 year ago

I agree with @IndrekHaav, be able to create a new person is missing when trying to filter out false positive. I am in the situation where i need to remove face from a person and i cannot find it back in the people tab once done, is this face lost forever ?

marcelklehr commented 1 year ago

remove them from the cluster completely. I tried the latter, but subsequent recognize:cluster-faces commands only put them back (I was hoping they would at least be put into new clusters).

I cannot reproduce this. For me removed face detections are not readded to the same cluster anymore.

IndrekHaav commented 1 year ago

@marcelklehr For me, every time the clustering job ran, the same faces kept getting added to the same person. I ended up deleting the detected faces from the DB, because they were faces I wasn't interested in anyway (random background people, and such).

How would this work anyway? Does the app keep track of clusters that a face has been removed from in some way?

marcelklehr commented 1 year ago

limiting the maximum edge length within a cluster [...] may be a better solution since this will specifically limit forming clusters in sparse areas of the face embedding space

@MB-Finski If you're up for implementing that, I'm happy to merge a pull request (let me know if you need help with git).

marcelklehr commented 1 year ago

How would this work anyway?

When removing a face from a cluster we store the distance from the cluster centroid along with the face and in the future only add it to a cluster if the distance to the cluster centroid is smaller.

marcelklehr commented 1 year ago

@MB-Finski I wonder if it would make sense to fall back to DBSCAN clustering for photo collections smaller than x photos, as HDBSCAN results are pretty wild on smaller collections.