Closed KoMa1012 closed 1 year ago
This is a phenomenon known as "shit clusters" forming. We've been able to squash it somewhat but it still occurs sometimes, especially with smaller collections. How many images do you have?
Also see #662
I've got a collection of roughly 7500-8000 files (including some videos), this doesn't feel like a smaller collection to me.
I read the title of the bug, but the title suggests, that faces are found, but not beeing added to a person. Sorry for the duplicate then.
this doesn't feel like a smaller collection to me
Yep, I agree
can you double check if this might happen if you've got many "old" pictures, this happens mostly with my pictures from 2014, so very old cameras. If I only add pictures from a later point in time it doesn't happen so often, but will do more research on this
I got the same error for almost every scheduled job (~5 minutes). But from "administration settings=>recognize" the face recognition queue is empty, and the object recognition was always busy (~100 to 300 pictures) even though no more new images are uploaded.
I am just guessing those "unrecognizable images" are queued once and once again due to this bug.
Finally, I have to disable object recognition to make NAS take a rest. : (
I cleared out all of the recognize tables from my database and started completely from scratch. I have a 42k pictures in total, but I only got through scanning 1100 faces in (verified in the database) before I've already got clusters from the same Christmas party with 3 different people tagged into the same cluster. One man (who is bald) and 2 different women (one middle aged, and one teenager). It gets even worse as I look at the rest of the 38 photos in this cluster. The clustering is so bad its unusable. I feel like my time would be better spent trying to write a script to pull the manually curated tags I created in digikam and wrote to the pictures metadata and manually load those into the recognize tables in the database so memories could pick up and display those.
The major issue here is that HDBSCAN, as currently implemented, has no upper limit to how large clusters it can form in terms of the face embedding space. This is problematic together with the incremental processing of new face detections and/or when a smaller number of face detections are being clustered (HDBSCAN would work best if all faces were clustered at once).
There's a potential fix in the pipeline: https://github.com/nextcloud/recognize/pull/711
Feedback would be greatly appreciated if anyone's up for testing it!
If you do test this patch you should maybe set MIN_SAMPLE_SIZE to 7 (ish) and MIN_CLUSTER_SIZE to ~8. Also, if you do get clusters with multiple identities you could experiment with decreasing MAX_CLUSTER_EDGE_LENGTH to below 0.5 (say ~0.2-0.3). All these settings can be found in FaceClusterAnalyzer.php.
The above mentioned hyperparameters (along with this patch) work very well for me, personally. Please, report back with any settings that you find work in your libraries.
@rhatguy:
As a quick fix you can try clustering all faces at once with occ recognize:reset-face-clusters && occ recognize:cluster-faces
(after scanning in all faces). To get the best mileage, you may have to fiddle with the hyperparameters, as stated above, since I suspect the current defaults (MIN_SAMPLE_SIZE/MIN_CLUSTER_SIZE) are a bit on the low side.
The major issue here is that HDBSCAN, as currently implemented, has no upper limit to how large clusters it can form in terms of the face embedding space. This is problematic together with the incremental processing of new face detections and/or when a smaller number of face detections are being clustered (HDBSCAN would work best if all faces were clustered at once).
There's a potential fix in the pipeline: #711
Feedback would be greatly appreciated if anyone's up for testing it!
If you do test this patch you should maybe set MIN_SAMPLE_SIZE to 7 (ish) and MIN_CLUSTER_SIZE to ~8. Also, if you do get clusters with multiple identities you could experiment with decreasing MAX_CLUSTER_EDGE_LENGTH to below 0.5 (say ~0.2-0.3). All these settings can be found in FaceClusterAnalyzer.php.
The above mentioned hyperparameters (along with this patch) work very well for me, personally. Please, report back with any settings that you find work in your libraries.
I could do this, is there a tutorial how do this on my nextcloud instance? only used official releaes until now.
Great! Here's the official tutorial for testing/developing custom apps: https://docs.nextcloud.com/server/latest/developer_manual/app_development/intro.html#edit-an-existing-app
That being said, for a simple test like this, perhaps the fastest way is by simply overwriting the existing files in your apps folder. This way the next official release of recognize will be overwritten on these files and you won't have to revert anything manually. Doing this is relatively safe but, as always, keep a fresh backup/snapshot at hand if this is your "production" instance.
First, make sure you're running the latest official version of recognize. Then, you'll only have to update the changed files in #711 and update the hyperparameters at the top of FaceClusterAnalyzer.php.
v3.7.0 should fix this. Let's continue the conversation about mistakes in face clustering over in the forum: #754
Which version of recognize are you using?
3.6.1
Enabled Modes
Object recognition, Face recognition, Video recognition
TensorFlow mode
Normal mode
Which Nextcloud version do you have installed?
25.0.4
Which Operating system do you have installed?
unraid 6.11.5
Which Docker container are you using to run Nextcloud? (if applicable)
23.0.12-apache
How much RAM does your server have?
32GB
What processor Architecture does your CPU have?
x86_64
Describe the Bug
Pictures are beeing asigned to a person, but they are not the same person. E.g. one "Person" contains >379 pictures with different "faces", different genders, different ages, not even same species (dogs and cats). I've got 8 of these "Persons" which contain 44-379 pictures per person. I tried to import a smaller set of pictures (just like a hundred or so) for a different user and it made the same thing again.
One Error showing up sometimes, but I'm not sure if this is related to recognize: {"reqId":"iW7janZMC0OHstALrOH5","level":3,"time":"2023-03-08T03:49:09+01:00","remoteAddr":"","user":"--","app":"PHP","method":"","url":"--","message":"imagecreatefromstring(): Data is not in a recognized format at /var/www/html/lib/private/legacy/OC_Image.php#758","userAgent":"--","version":"25.0.4.1","exception":{"Exception":"Error","Message":"imagecreatefromstring(): Data is not in a recognized format at /var/www/html/lib/private/legacy/OC_Image.php#758","Code":0,"Trace":[{"function":"onError","class":"OC\Log\ErrorHandler","type":"::","args":[2,"imagecreatefromstring(): Data is not in a recognized format","/var/www/html/lib/private/legacy/OC_Image.php",758]},{"file":"/var/www/html/lib/private/legacy/OC_Image.php","line":758,"function":"imagecreatefromstring","args":[" sensitive parameters replaced "]},{"file":"/var/www/html/lib/private/Preview/Image.php","line":52,"function":"loadFromFile","class":"OC_Image","type":"->","args":["/var/www/html/data/markus/files/Documents/My Games/Tom Clancy's The Division/RogueAccounts/StoreImgs/tctdreward33_thumbnail.png"]},{"file":"/var/www/html/lib/private/Preview/GeneratorHelper.php","line":65,"function":"getThumbnail","class":"OC\Preview\Image","type":"->","args":[["OC\Files\Node\File"],4096,4096]},{"file":"/var/www/html/lib/private/Preview/Generator.php","line":343,"function":"getThumbnail","class":"OC\Preview\GeneratorHelper","type":"->","args":[["OC\Preview\PNG"],["OC\Files\Node\File"],4096,4096]},{"file":"/var/www/html/lib/private/Preview/Generator.php","line":162,"function":"getMaxPreview","class":"OC\Preview\Generator","type":"->","args":[["OC\Files\SimpleFS\SimpleFolder"],["OC\Files\Node\File"],"image/png",""]},{"file":"/var/www/html/lib/private/Preview/Generator.php","line":114,"function":"generatePreviews","class":"OC\Preview\Generator","type":"->","args":[["OC\Files\Node\File"],[[1024,1024,false,"fill"]],"image/png"]},{"file":"/var/www/html/lib/private/PreviewManager.php","line":185,"function":"getPreview","class":"OC\Preview\Generator","type":"->","args":[["OC\Files\Node\File"],1024,1024,false,"fill",null]},{"file":"/var/www/html/custom_apps/recognize/lib/Classifiers/Classifier.php","line":255,"function":"getPreview","class":"OC\PreviewManager","type":"->","args":[["OC\Files\Node\File"],1024,1024]},{"file":"/var/www/html/custom_apps/recognize/lib/Classifiers/Classifier.php","line":84,"function":"getConvertedFilePath","class":"OCA\Recognize\Classifiers\Classifier","type":"->","args":[["OC\Files\Node\File"]]},{"file":"/var/www/html/custom_apps/recognize/lib/Classifiers/Images/ClusteringFaceClassifier.php","line":83,"function":"classifyFiles","class":"OCA\Recognize\Classifiers\Classifier","type":"->","args":["faces",[["OCA\Recognize\Db\QueueFile",164404],["OCA\Recognize\Db\QueueFile",164405],["OCA\Recognize\Db\QueueFile",164406],["OCA\Recognize\Db\QueueFile",164407],["OCA\Recognize\Db\QueueFile",164408],"And 93 more entries, set log level to debug to see all entries"],120]},{"file":"/var/www/html/custom_apps/recognize/lib/BackgroundJobs/ClassifyFacesJob.php","line":41,"function":"classify","class":"OCA\Recognize\Classifiers\Images\ClusteringFaceClassifier","type":"->","args":[[["OCA\Recognize\Db\QueueFile",164404],["OCA\Recognize\Db\QueueFile",164405],["OCA\Recognize\Db\QueueFile",164406],["OCA\Recognize\Db\QueueFile",164407],["OCA\Recognize\Db\QueueFile",164408],"And 93 more entries, set log level to debug to see all entries"]]},{"file":"/var/www/html/custom_apps/recognize/lib/BackgroundJobs/ClassifierJob.php","line":70,"function":"classify","class":"OCA\Recognize\BackgroundJobs\ClassifyFacesJob","type":"->","args":[[["OCA\Recognize\Db\QueueFile",164404],["OCA\Recognize\Db\QueueFile",164405],["OCA\Recognize\Db\QueueFile",164406],["OCA\Recognize\Db\QueueFile",164407],["OCA\Recognize\Db\QueueFile",164408],"And 93 more entries, set log level to debug to see all entries"]]},{"file":"/var/www/html/custom_apps/recognize/lib/BackgroundJobs/ClassifyFacesJob.php","line":33,"function":"runClassifier","class":"OCA\Recognize\BackgroundJobs\ClassifierJob","type":"->","args":["faces",[3,266]]},{"file":"/var/www/html/lib/public/BackgroundJob/Job.php","line":78,"function":"run","class":"OCA\Recognize\BackgroundJobs\ClassifyFacesJob","type":"->","args":[[3,266]]},{"file":"/var/www/html/lib/public/BackgroundJob/TimedJob.php","line":103,"function":"start","class":"OCP\BackgroundJob\Job","type":"->","args":[["OC\BackgroundJob\JobList"]]},{"file":"/var/www/html/lib/public/BackgroundJob/TimedJob.php","line":93,"function":"start","class":"OCP\BackgroundJob\TimedJob","type":"->","args":[["OC\BackgroundJob\JobList"]]},{"file":"/var/www/html/cron.php","line":152,"function":"execute","class":"OCP\BackgroundJob\TimedJob","type":"->","args":[["OC\BackgroundJob\JobList"],["OC\Log"]]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":92,"CustomMessage":"--"},"id":"640843e8c4956"}
Expected Behavior
do not create "generic" people where everything (not even face) ist stored in.
To Reproduce
import pictures, start recognize.
Debug log
No response