Open tobiasKaminsky opened 4 years ago
Hi @tobiasKaminsky Welcome here!. 😄
The code that is failing you is a routine that at some point I would like to optimize.. and honestly you are the 2nd person to report this (But it had 1Gb ram, and used minimum confidence in 0.6).
Well, a couple of things to consider:
Without avoiding my responsibility, probably the last point is your inconvenience. Finally, note that when we set 2GB memory requirements (Now just 1Gb), we believed that it was really the absolute minimum. That limitation is mainly imposed by the analysis task, and not by the clustering task, but in general, I guess you already discovered that it is a task that needs enough resources.
* How many images are you using?
55k :-)
* What parameter are you using as minimum confidence?
0.99, seems it was the default?
* 2Gb may seem like a lot, but how much free memory do you have before executing the command?.
I have 10Gb, and currently 8460Mb are free.
Hi @tobiasKaminsky
55k :-)
Ohh.. Your problem is the number of photos. 🙈 Great collection of photos!. 😉
- What parameter are you using as minimum confidence?
0.99, seems it was the default?
You're right, I was referring to sensitivity. 😅 If you increased it, I would recommend reducing it again.
I have 10Gb, and currently 8460Mb are free.
Ohh.. Thinking again, also comment bad this.. The problem here is excess of the memory_limit imposed by PHP and not the ram memory. 😅 What you have to see is the memory consumed by all your php processes, but now it is out of the question. Forget this, I don't even know how to check it. 😅
Last doubt. You analyzed most of the photos with an earlier version, and fails when upgrade to the latest?
If so, maybe I can do a patch quickly to help you.
EDIT I can not do it. Reviewing the code, the optimization I thought was really done!. 😞 The only option left in the future is to do this in batches, but will require significant work before.
The only option left in the future is to do this in batches, but will require significant work before.
That would be the goal, I think :+1: Thanks for helping :)
Sorry to hijack the conversation with a non-issue question, but reading this thread made me curious about somenthing. I installed face recognition on my server a couple of week ago and it found so far 210271 images, 95688 faces, 32389 persons in my collection. It's going well. The app hangs sometimes, but I made a script to restart it, and it's working just fine. Besides the fact that each restart it takes more and more time to jump from task 6/10 to 7/10, but I know it's because only uses one thread to do all this job. Said that, my question is: how many photos the app can handle? Is there any limitation or recomendation?
Hi @ftrentini As commented, we must remember that it is an exhaustive task and needs resources but in general doesn't seem to be a problem. In collections like yours, 2GB is probably not enough, but it doesn't seem to be unmanageable..
Can I propose a little experiment?. Let's see memory consumption with yours photos..
diff --git a/lib/BackgroundJob/Tasks/CreateClustersTask.php b/lib/BackgroundJob/Tasks/CreateClustersTask.php
index d2685a4..dd99c9c 100644
--- a/lib/BackgroundJob/Tasks/CreateClustersTask.php
+++ b/lib/BackgroundJob/Tasks/CreateClustersTask.php
@@ -148,15 +148,19 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
}
}
+
// Ok. If we are here, the clusters must be recreated.
//
+ $memoryBefore = memory_get_peak_usage(true);
$faces = $this->faceMapper->getFaces($userId, $modelId);
$this->logInfo(count($faces) . ' faces found for clustering');
+ $this->logInfo('Memory (MB) consumed by Faces: ' . (memory_get_peak_usage(true) - $memoryBefore) / 1024 / 1024);
// Cluster is associative array where key is person ID.
// Value is array of face IDs. For old clusters, person IDs are some existing person IDs,
// and for new clusters is whatever chinese whispers decides to identify them.
//
+ $memoryBefore = memory_get_peak_usage(true);
$currentClusters = $this->getCurrentClusters($faces);
$newClusters = $this->getNewClusters($faces);
$this->logInfo(count($newClusters) . ' persons found after clustering');
@@ -165,6 +169,8 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
$mergedClusters = $this->mergeClusters($currentClusters, $newClusters);
$this->personMapper->mergeClusterToDatabase($userId, $currentClusters, $mergedClusters);
+ $this->logInfo('Memory (MB) consumed to merge clusters: ' . (memory_get_peak_usage(true) - $memoryBefore) / 1024 / 1024);
+
// Remove all orphaned persons (those without any faces)
// NOTE: we will do this for all models, not just for current one, but this is not problem.
$orphansDeleted = $this->personMapper->deleteOrphaned($userId);
@@ -273,6 +279,7 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
// Create edges for chinese whispers
$edges = array();
+ $memoryBefore = memory_get_peak_usage(true);
if (version_compare(phpversion('pdlib'), '1.0.2', '>=')) {
$faces_count = count($faces);
for ($i = 0; $i < $faces_count; $i++) {
@@ -318,7 +325,9 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
}
}
}
+ $this->logInfo('Memory (MB) consumed by faces edges: ' . (memory_get_peak_usage(true) - $memoryBefore) / 1024 / 1024);
+ $memoryBefore = memory_get_peak_usage(true);
$newChineseClustersByIndex = dlib_chinese_whispers($edges);
$newClusters = array();
for ($i = 0, $c = count($newChineseClustersByIndex); $i < $c; $i++) {
@@ -327,6 +336,7 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
}
$newClusters[$newChineseClustersByIndex[$i]][] = $faces[$i]->id;
}
+ $this->logInfo('Memory (MB) consumed to get new clusters: ' . (memory_get_peak_usage(true) - $memoryBefore) / 1024 / 1024);
return $newClusters;
}
@@ -394,4 +404,17 @@ class CreateClustersTask extends FaceRecognitionBackgroundTask {
}
return $result;
}
In my case with 12,000 photos and 9000 faces:
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
Found 0 faces without associated persons for user matias and model 4
Clusters already exist, but there was some change that requires recreating the clusters
9824 faces found for clustering
Memory (MB) consumed by Faces: 144
Memory (MB) consumed by faces edges: 6.0078125
Memory (MB) consumed to get new clusters: 0
5440 persons found after clustering
Memory (MB) consumed to merge clusters: 6.0078125
Just 144 megabytes against 3 Gb used for processing each photo it is more than acceptable.. :sweat_smile: But I would like to know how behave in your cases. :thinking:
Oh, ok!! FYI, my server runs with 48GB of RAM, with 6GB setted to php-fpm (and no limit to php-cli). I am aware that your script uses up to 4GB, I read that somewhere, so, I think that makes my question answered after all. Just applied your patch and ran the app. The results:
Found 132 faces without associated persons for user ftrentini and model 1 Face clustering will be recreated with new information or changes 118348 faces found for clustering Memory (MB) consumed by Faces: 1750.453125 Memory (MB) consumed by faces edges: 9184.0546875 Memory (MB) consumed to get new clusters: 24.0078125 38603 persons found after clustering Memory (MB) consumed to merge clusters: 9208.0625
Wow.. According to this, you used about 20 GB of ram in the whole process.. :open_mouth: In general, the memory recommendation continues to depend specifically on the number of photos of the user.. but your numbers surprised me.
@tobiasKaminsky probably needs less than 3GB to run it. I would really appreciate seeing the test. :smile:
Well, I keep thinking how to optimize it. In principle, doing it in batches seems interesting, but it can bring more problems, since the direct consequence is that the number of persons would increase dramatically which would be disappointing. :thinking:
On the other hand, I just discovery that we can change the memory limit only for the script.
php -d memory_limit=2048M occ face:back... . . .
Wow. I would have to adjust the documentation. and suggest that just in case it is necessary increase the limit only for the execution of this task.. and leave the server with more conservative values ..
Feel free to use me as a tester!!
Look. This is the log of the CPU and Memory (I took from webmin dashboard) of the last 12h, processing the final 20K images. It's taking about two and a half hours to cluster.
And I took one screenshot during the processing and htop says that I'm using 13G of RAM. (And a little correction, this VM instance uses 32GB instead of 48GB I said on the last post).
Hi, how is this issue going? My server does not have that much memory and it cannot continue to analyse faces any more. I have ~100k pictures and after processing ~70k pictures it runs out of memory trying to cluster faces.
I have the same issue with my collection of ~180k photos
Hi everyone, In the last commit, just implemented a small? memory optimization, which in my tests represent an improvement of 34%
I don't have as many photos as you to know how to scale, but I trust that it will improve your results. 🤔
I guess the problem remains as photo library grows, currently sitting 650k+ photos.
Hi @rarealphacat The improvement proposal does not apply to such a large number of images.. 🙈 Surely you have a significant improvement in memory consumption, but you will certainly need some GB of ram. 😥
But you can check the consumption with disabled memory limit?
[matias@nube nextcloud]$ sudo -u apache /usr/bin/time -f %M php -d memory_limit=-1 occ face:background_job -u user
1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.)
4/8 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB)
5/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
Face clustering will be recreated with new information or changes
6060 faces found for clustering
3287 persons found after clustering
6/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
7/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
8/8 - Executing task ImageProcessingTask (Process all images to extract faces)
NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
232520
Here: 232520kb / 1024 = 227,07 MB
I plan to implement batching for these edge cases, but I'm interested in current consumption for reference. 🤔
Looks like it needs some time to process, I'll report back when finished.
it gives 2884866 after 8/8
I reset everything and ran the cmd again for all users it gives 22529270 = 20GB+? Looks like it varies alot by many factors.
Hi @rarealphacat Honestly, speaking of 650 thousand photos, 20 gb of ram seems an acceptable number to me.. I would have expected a higher number, and without the patch you probably wouldn't be able to run it directly... 😅
Beyond that personally I am satisfied, I am aware that surely there are people with even more photos than you, and the current quasi-linear approach of memory consumption is unmanageable. 🤔
So, I reaffirm the need to make a change in the process to work in batches of images.
I hope I can do it soon. Please, a little more patience..
I found this solution: add para “-d memory_limit=8096M” www-data@a0135ed2509e:~/html$ php -d memory_limit=8096M ./occ face:background_job -u honey -vvv
www-data@a0135ed2509e:~/html$ ./occ face:stats +---------+--------+-------+----------+---------+ | User | Images | Faces | Clusters | Persons | +---------+--------+-------+----------+---------+ | admin | 9 | 0 | 0 | 0 | | andy | 141336 | 14002 | 0 | 0 | | honey | 25882 | 5520 | 0 | 0 |
Error
www-data@a0135ed2509e:~/html$ ./occ face:background_job -u andy -vvv 1/8 - Executing task CheckRequirementsTask (Check all requirements) System: Linux System memory: 33497980928 PHP Memory Limit: 4294967296 2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command) 3/8 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.) yielding 4/8 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB) Skipping stale images removal for user andy as there is no need for it 5/8 - Executing task CreateClustersTask (Create new persons or update existing persons) Face clustering will be created for the first time. 14002 faces found for clustering PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/custom_apps/facerecognition/lib/BackgroundJob/Tasks/CreateClustersTask.php on line 298 PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/lib/private/Log.php on line 211
I found this solution: add para “-d memory_limit=8096M”
www-data@a0135ed2509e:~/html$ php -d memory_limit=8096M ./occ face:background_job -u honey -vvv
1/8 - Executing task CheckRequirementsTask (Check all requirements) System: Linux System memory: 33497980928 PHP Memory Limit: 8489271296 2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command) 3/8 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.) yielding 4/8 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB) Skipping stale images removal for user honey as there is no need for it 5/8 - Executing task CreateClustersTask (Create new persons or update existing persons) Found 0 faces without associated persons for user honey and model 3 Clusters already exist, but there was some change that requires recreating the clusters 5520 faces found for clustering 396 persons found after clustering Deleted 1 persons without faces yielding 6/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB) Skipping full image scan for user honey 7/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them) yielding 8/8 - Executing task ImageProcessingTask (Process all images to extract faces) NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
I have got the solution to complete clustering. run occ with php para. FYI. url
www-data@a0135ed2509e:~/html$ php -d memory_limit=8096M ./occ face:background_job -u honey -vvv
@xwyangjshb it works, I'm creating cron with your solution
PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 268435464 bytes) in /srv/nextcloud/apps/facerecognition/lib/BackgroundJob/Tasks/CreateClustersTask.php on line 293
I can increase memory as my server is powerful enough, but ideally it should not consume that much memory, or?