Closed bsaggy closed 1 year ago
Hi @bdevy
Thanks for your feedback, I'll try to help troubleshoot. There could be multiple things wrong here. One thing is the number of clusters which is likely due to the recently increased sensitivity in clustering which will create multiple clusters for the same person if the person looks a bit different on some days (think, new haircut, new glasses, etc.). I'm thinking about exposing the sensitivity here as an admin setting to let people play with it themselves. The second thing that could go wrong is the attribution of who this cluster is about. I take it you're using the "memories" app as the photos app doesn't display the person the cluster is about. When looking at the photos of one cluster or another, can you make out a person that is in all photos of a cluster? This is likely the actual person this cluster is about? If there is no person that is in all photos of a cluster, then the actual clustering is at fault, which is quite weird. Perhaps it has to do with WASM mode not being as accurate as normal tensorflow mode. I'll have to investigate that.
Hi @marcelklehr,
I think the sensitivity is definitely a factor - there are some pictures where the same person looks a bit different being sent to different unique clusters, but then there are are also pictures of the same person where the pictures were taken just moments apart - the picture looks almost identical but those pictures are being sent to different unique clusters as well. Very confusing. A knob to tune sensitivity might be helpful in this scenario.
I am using the "Memories" app and it works very well - the Photos app had some severe performance issues as the number of clusters grew (One issue is that the merge person dialog would stop appearing when the number of clusters got to maybe around 100 - the second issue is that the photos page would hang while loading when clicking on People, almost as if it was reclassifying all of the recognized faces in real-time - there would even be node.js process running on the server during this time but it got to the point where the page was just unusable. I digress).
When looking at the photos of one cluster or another, can you make out a person that is in all photos of a cluster?
Anyway, no, I cannot make out a person that is in all photos of a given cluster. I do have the "Mark person in preview" enabled so I see the green box around the person - but this box is around different people in the same cluster. Some pictures still have the person I think the cluster is about, but many pictures don't. So my answer to your question has to be "no", and as you say the clustering may be at fault. I'm checking to see if can enable AVX to utilize normal tensorflow mode instead of WASM mode if you think that might help. Over 10,200 clusters found now (~3k increase from 24 hours ago) and 88k files still in queue.
Thank you for your time in helping to understand these issues!
Brian
the Photos app had some severe performance issues as the number of clusters grew
Yeah, we're aware of that now and I've provided a fix that will be released soon. :)
Anyway, no, I cannot make out a person that is in all photos of a given cluster.
Mmh, that's not good :/
I have the same problem with the same configuration in WASM mode
Anyway, no, I cannot make out a person that is in all photos of a given cluster.
Mmh, that's not good :/
Same here, the people in several clusters seem to be completely random. I also see mixed faces between babies and elderly and also different colored hair.
I tried but I cannot reproduce this. I enabled WASM mode on my dev instance and let it sift through my personal photo collection. The result is a well sorted collection of photos categorized by face. No false positives at all apart from one rubbish cluster with all the outliers.
Same issue here. I'm running in native tensorflow (not WASM). Mine has been running for 1-2 weeks now and is 53k to go. I see on my dual 14 core server (28 real cores) that recognize keeps only one CPU thread spiked and isn't leveraging all available cores, so the initial run through my 41k picture collection is taking forever. I have cores set to 16 in the recognize settings, but it doesn't seem to do anything. Now Nextcloud is complaining that my cron isn't running for over 4 hours because each recognize run is taking so long.
Below you can see an example of one of the clusters that is showing inside of Memories. If clustering is going to be this bad its probably less than useful.
I am also experiencing this problem.
MariaDB [nextcloud]> select count(*) from oc_recognize_face_clusters;
+----------+
| count(*) |
+----------+
| 20020 |
+----------+
1 row in set (0.007 sec)
I tried but I cannot reproduce this. I enabled WASM mode on my dev instance and let it sift through my personal photo collection. The result is a well sorted collection of photos categorized by face. No false positives at all apart from one rubbish cluster with all the outliers.
Hi @marcelklehr,
I enabled CPU pass through on my VM and WASM mode was no longer necessary so I disabled it. However, the issue persisted. I finally disabled Facial Recognition at 20k clusters. I don't see any point in letting it continue in its current state.
I had installed and just started using Recognition v2 on NC 24 shortly before upgrading to NC 25 and Recognition v3. I don't know if that factors into the behavior at all, but thought it was worth mentioning.
@rhatguy's example is pretty spot on to what I've observed as well - clusters with many different people of interest. Is there any more info I can provide you to aid in the troubleshooting of this issue? Is there a way to "reset" Recognize back to its defaults as if it were a fresh install, and if so is it worth attempting that to see if it then behaves as expected?
Thanks, Brian
fwiw, it happens without WASM too
@illnesse Do you experience it, too? Didn't it work for you before?
Hello, same here, without WASM.
I have lots of clusters containing very different people. Sometimes the person in the preview bubble in the memory app is only on one picture in the cluster. The rest of the faces can be from completely different persons, from babys to granies, males and females. It's all mixed up.
I have the "mark preview" activated so I see what the model sees, and what it thinks is the same face... and it's absolutely wrong
On 200 clusters only a handful are actually with a single person in it.
EDIT: Recognize v3.2.2, first time I am using this. Only recognizing faces, 100 per cronjob. WSAM disabled.
BTW I have an increasing number of faces queued which never get dequeued. Not sure if related or not. I can wait days, it doesn't remove the queued faces. It increases until there is about 12k queued. Then stops. Cronjob is instant, it seems it doesn't do neither crawling nor classifying. When manipulating the clusters, merging faces, removing people, the queue increases a bit more, but doesn't dequeue. Very weird
EDIT2: I'm just thinking about it, but I have a lot of "asian" faces. These gets mixed up much more than caucasian faces where I have very few false positives... hmmmm
FWIW, I'm also experiencing this issue. Tens of different people get assigned to 1-2 mega clusters while quite often photos of the same person (taken only seconds apart) get assigned to separate identities/clusters. I tried manually reassigning all images to correct clusters but still processing more photos will result in new "mega clusters" (and occasionally new miniclusters of existing identities).
For me, the formation of the large clusters with tens of different people is the bigger issue. Basically, I have to manually assign the identitiy of almost every detected face.
NC 25.0.1 Recognize 3.2.3 (Native TF-mode)
It can happen that 1-2 Mega clusters appear with faces that couldn't be assigned to a different cluster. That's one thing, but if every cluster is a random assortment of people, then something fishy is going on.
Not all clusters are random. It's just that once these mega clusters form, most (something like 80%-90%) of the subsequently detected faces will be assigned to them.
I've previously tried to reset/remove the face tags from the GUI (i.e. from the admin settings), but I started wondering if that also resets all clusters that have been previously detected?
@rhatguy, wondering if the CPU being at 100% (https://github.com/nextcloud/recognize/issues/475#issuecomment-1317634158) could not be related to my observation described here (in separate report to avoid steeling the thread subject: https://github.com/nextcloud/recognize/issues/546.
It can happen that 1-2 Mega clusters appear with faces that couldn't be assigned to a different cluster. That's one thing, but if every cluster is a random assortment of people, then something fishy is going on.
Hi @marcelklehr, if there's any testing or data gathering that I can do for you, please let me know.
It's just that once these mega clusters form, most (something like 80%-90%) of the subsequently detected faces will be assigned to them.
That's a good point. Apparently the cluster algorithm is a bit overzealous and over time we get black hole clusters. I was able to mitigate this in my test sample by adding a constraint on the inner cluster distance. If a cluster is larger than what could possibly be the same face, we simply disregard it. I'll run some more tests to verify that there's no negative consequences to this.
v3.3.3 is out now with the fix. After installing the update, make sure to remove the mega-clusters. Once you add a new face picture the clusters will be recalculated and hopefully the mega-clusters won't come back.
Hey @marcelklehr, that's great news!
Can you let me know the best way to delete thousands of mega-clusters? Would it be via database query?
Or better yet, I would be happy to just re-initialize the Recognize app and its database. I had only just begun using Recognize, so I don't mind wiping its progress and starting it from scratch. How could I do this?
Thanks, Brian
You can run occ recognize:reset-faces which removes all face detections and face clusters from the database. Then you may run recognize:classify or trigger classification in the background by toggling the face recognition setting in the admin settings.
@bdevy
Here's what I did to reset everything.
Admin:
Turn off all of the recognize toggles
OCC commands:
occ recognize:cleanup-tags
occ recognize:reset-tags
occ recognize:reset-faces
SQL:
delete from oc_jobs where class like '%Recognize%';
delete from oc_recognize_queue_faces;
delete from oc_recognize_queue_imagenet;
delete from oc_recognize_queue_landmarks;
delete from oc_recognize_queue_movinet;
delete from oc_recognize_queue_musicnn;
delete from oc_recognize_face_clusters;
delete from oc_recognize_face_detections;
Admin:
Turn on face recognition toggle
OCC command:
occ recognize:classify
Aliright since version v3.3.3 I do not have these big clusters. However... Now almost none of my photos gets recognized.
With previous version I had around 500 faces recognized.
Now I have about 40.
And I still have 10k queued files according to the admin interface. I can run occ recognize:classify
by hand, and it does output lots of things. Most notably "Face score too low".
The queue aslo keep increasing after each cron.
It seems I can run occ recognize:classify
endlessly.
It seems I can run occ recognize:classify endlessly.
the classify command does not utilize the queue tables in the database, so the queue count in the interface doesn't apply to the command.
Now almost none of my photos gets recognized.
Mh, I also changed the threshold for face detection a bit, maybe that was too overzealous :/
With previous version I had around 500 faces recognized.
500 face clusters or 500 face detections?
With previous version I had around 500 faces recognized.
500 face clusters or 500 face detections?
40 face clusters. When before I had around 500. It is also the number of total people count shown in the Memories > People.
I have currently 4K face detections (oc_recognize_face_detections). Though before v3.3.3 I have never counted the face detections, so I can't compare with the current version.
Similar outcome testing 3.3.3. ~6000 detected faces. Maybe 30 clusters. No more than 13 faces per cluster.
I'm currently experimenting with various inner radius values. As I'm testing anyways, would increasing minimum cluster density make any sense? With a somewhat larger value, outliers would perhaps not creep into clusters as easily and, on the other hand, a little bit larger radius could be used?
Come to think of it, this outcome actually makes sense given the code changes. The mega-clusters created by DBSCAN didn't go anywhere now they're just not saved as known clusters. This will likely block any face within these clusters from ever being added to any identity.
EDIT: So the solution could perhaps be to have an even stricter min radius and/or larger min density?
A further possible solution just off the top of my head, would be to maybe restrict DBSCAN to run on only smaller batches of images (maybe group the batches by date)? This would reduce the possibility of pure "noise" from connecting unrelated clusters which will eventually lead to these mega clusters given large enough sample sizes.
EDIT2: Currently testing with cluster density 8 and radius 0.25 -- getting pretty good results but I'm sure these are still far from optimal settings (and optimal settings will depend on the source data, of course). I also set MAX_INNER_CLUSTER_DISTANCE = 999.0 and the mega clusters have not made a reappearance.
@MB-Finski feel free to drop by our matrix channel for discussing this, I'm also playing around with this atm :)
Just some feedback: 3.3.3 is significantly improved! Mega clusters no more, and very few errors within clusters.
I'm still re-indexing everything, but so far my impression is face detections are a bit low for my personal preference (I'd rather err on the side of grabbing everyone in the photo, at the expense of more small clusters). +1 for the feature request to expose the detection and clustering parameters in the settings UI so that we can tweak the system to our preference.
@farhills Thanks for the feedback!
I've just released https://github.com/nextcloud/recognize/releases/tag/v3.3.4 which should improve this even more and includes incremental clustering, which should significantly speed up clustering.
Before I install I'll grab some screenshots to compare before/after. Is there a need to fully remove and reinstall, or can I just recrawl to update the classifications on existing photos?
PS thanks for working on this!
@farhills You can use the clear faces command and then toggle the face recognition setting in the admin settings once.
Results are in: substantially faster! I cheated on this go and only did faces (no object detection), but run time was ~24h whereas with 3.3.3 (face and object) it ran for 3-5 days. (No audio/video on either run)
Good news: the number of faces (sum of all clusters) is a lot higher. Only ~200 total faces in 3.3.3, now ~2000. 10x improvement!
Bad news: differentiation between individuals isn't tight enough. In 3.3.3 I had 50 'named' clusters, plus another 15-20 random people grabbed from photo backgrounds. In 3.3.4 it's only found 8 unique faces, lots of mixed clusters (dare I say mega clusters?). From what I've reviewed, there were zero false-positive 'is it a face' detections. The issue is the clusters include too many 'similar' but unique faces.
The 'is it a face' threshold can be lowered, but the 'is this the same face' threshold during the clustering needs to be raised.
The big cluster issue raises a second UI problem that is probably shared jointly between Recognize and photos/memories. In those 8 unique faces, there are many mis-identified individuals. But since those people don't have their own cluster, I don't have a GUI option to assign them anywhere. The only option is 'remove person' which removes them from the cluster, but hides them away forever. To fix:
Recognize v3.3.3 after processing and merging clusters:
Recognize v3.3.4 after processing, partial merging of clusters:
@farhills thank you for reviewing! After evaluating the situation with @MB-Finski on gitter we believe that shit-clusters (as I like to call them) largely result from improper encoding of partially visible faces of the Neural network we employ, so there's nothing in the clustering algorithm we can do. As you've noted, I've tried my best to filter out non-faces from the face detections (by excluding small faces which are often not visible enough to the encoder, and by increasing the face probability threshold, which excludes faces that the network is not too confident about).
Personally, with v3.3.4 I'm seeing the best results yet on my production machine. One shit cluster, but that's to be expected as per above.
Hi @marcelklehr, thank you for your work on this! v3.3.4 is much improved! "Mega-clusters" seem to be fixed. On my most recent run, there were several "shit clusters" with a dozen or two images in each. I reassigned those images to different clusters or just removed them as needed. While it was a bit of a pain, it was much more manageable than the thousands of clusters in earlier versions.
I have a couple clusters with 1-3k images in them. These were mostly all of the same person, but did exhibit some "shit cluster" attributes by at times including all different people in the cluster. It took some time, but I reassigned/removed those images as needed.
I am overall very impressed at the clustering of images in v3.3.4 - I have pictures of the same people from the past 15 years that were categorized into their respective clusters, whereas in previous versions this seemed to have created many different clusters!
I have a few comments/questions on issues or feature requests which I'll create separate issues on. One of those relates to https://github.com/nextcloud/recognize/issues/442 - I have 22k faces detected, 11k of which are categorized as NULL. I am confident that many of these should be added to existing clusters or warrant a new cluster. Is there any way to debug why a face is not getting categorized at all?
And can you help me understand your comment here?
we believe that shit-clusters (as I like to call them) largely result from improper encoding of partially visible faces of the Neural network we employ, so there's nothing in the clustering algorithm we can do.
I am sure that many of my NULL categorized images are not partially visible faces, but I definitely don't understand the inner workings of the Neural network.
Thanks! Brian
Personally, with v3.3.4 I'm seeing the best results yet on my production machine. One shit cluster, but that's to be expected as per above.
I reset my recognized faces yesterday and initiated a recrawl and the results for now are really great! I can already see a great improvement concerning clustering compared to earlier 3.x versions - thanks for your work @marcelklehr !
With v3.5.0 (out today) we've replaced the clustering algorithm with a better one, which should greatly improve the issues outlined here. After updating to v3.5.0 you can run occ recognize:reset-face-clusters
and occ recognize:cluster-faces
to re-run clustering. Let me know how it goes! :rocket:
Very very good improvements!
Unfortunately re-run is not working for me:
occ recognize:cluster-faces Clustering face detections for user user ClusterDebug: Retrieving face detections for user user ClusterDebug: Found 2607 unclustered detections. Calculating clusters. An unhandled exception has been thrown: Error: Call to undefined method Rubix\ML\Datasets\Labeled::features() in /var/www/nextcloud/apps/recognize/lib/Clustering/DualTreeBall.php:107 Stack trace:
I'm using wasm mode on an x86 machine.
Unfortunately re-run is not working for me:
occ recognize:cluster-faces Clustering face detections for user user ClusterDebug: Retrieving face detections for user user ClusterDebug: Found 2607 unclustered detections. Calculating clusters. An unhandled exception has been thrown: Error: Call to undefined method Rubix\ML\Datasets\Labeled::features() in /var/www/nextcloud/apps/recognize/lib/Clustering/DualTreeBall.php:107 Stack trace: #0 /var/www/nextcloud/apps/recognize/lib/Clustering/MrdBallTree.php(632): OCA\Recognize\Clustering\DualTreeBall::split() #1 /var/www/nextcloud/apps/recognize/lib/Clustering/MstSolver.php(22): OCA\Recognize\Clustering\MrdBallTree->grow() #2 /var/www/nextcloud/apps/recognize/lib/Clustering/HDBSCAN.php(90): OCA\Recognize\Clustering\MstSolver->construct() #3 /var/www/nextcloud/apps/recognize/lib/Service/FaceClusterAnalyzer.php(73): OCA\Recognize\Clustering\HDBSCAN->construct() #4 /var/www/nextcloud/apps/recognize/lib/Command/ClusterFaces.php(62): OCA\Recognize\Service\FaceClusterAnalyzer->calculateClusters() #5 /var/www/nextcloud/3rdparty/symfony/console/Command/Command.php(255): OCA\Recognize\Command\ClusterFaces->execute() #6 /var/www/nextcloud/3rdparty/symfony/console/Application.php(1009): Symfony\Component\Console\Command\Command->run() #7 /var/www/nextcloud/3rdparty/symfony/console/Application.php(273): Symfony\Component\Console\Application->doRunCommand() #8 /var/www/nextcloud/3rdparty/symfony/console/Application.php(149): Symfony\Component\Console\Application->doRun() #9 /var/www/nextcloud/lib/private/Console/Application.php(213): Symfony\Component\Console\Application->run() #10 /var/www/nextcloud/console.php(100): OC\Console\Application->run() #11 /var/www/nextcloud/occ(11): require_once('...') #12 {main}
I'm using wasm mode on an x86 machine.
see #676
the results for now are really great
Very very good improvements!
With this feedback I'm closing this thread for now. We'll continue to look into improving clustering, but I believe we've found a good spot in the solution space. (Thanks to @MB-Finski for working on this!)
Describe the bug I'm running Recognize against ~35k images. It's creating way too many clusters, currently above 7k and growing.
The craziest part is that I'll click on a cluster in the Memories App with Mark Person in Preview enabled, and see multiple different people with the green bound box around them across all the pictures in that same cluster.
For example, one cluster has myself, my wife, my grandmother, my mother in law, my sister in law, my brother in law, a friend of a different skin color - all as the person of interest in this cluster. Another cluster has my 2 month old son, myself, my wife, my sister in law, my grandfather, etc. all as the person of interest in the cluster.
While I understand there is a margin for error in facial recognition, I have to believe something is wrong here. With over 7,000 clusters and every cluster containing all kinds of people of interest as indicated by the green bounding box, this is pretty much useless to me at this point. ~92k queued files still yet to go.
To Reproduce Steps to reproduce the behavior: I can't say that this is necessarily "reproducible", but this is the evolution of how I have used Recognize thus far.
Expected behavior Facial Recognition to work more accurately, and not identity my whole family including friends as the same person. Create less clusters but with more accuracy.
Recognize (please complete the following information):
Server (please complete the following information): System Configuration
Recognize Configuration:
Additional context If there's anything else I can check or do, please let me know.