nextcloud / recognize

👁 👂 Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres
https://apps.nextcloud.com/apps/recognize
GNU Affero General Public License v3.0
563 stars 45 forks source link

Recognize runs out of memory when clustering faces #1033

Closed deddc23efb closed 1 year ago

deddc23efb commented 1 year ago

Which version of recognize are you using?

5.0.3

Enabled Modes

Face recognition

TensorFlow mode

Normal mode

Downstream App

Memories App

Which Nextcloud version do you have installed?

27.1.3

Which Operating system do you have installed?

Ubuntu 22.04

Which database are you running Nextcloud on?

maria db

Which Docker container are you using to run Nextcloud? (if applicable)

No response

How much RAM does your server have?

16GB

What processor Architecture does your CPU have?

x86_64

Describe the Bug

Recognize eventually consumes all memory when running occ cluster-faces from the cli with large sets of photo collections. In my particular case, I have approx 37,000 faces identified. The clustering stage always fails and PHP is killed due to out of memory. The issue is the tensorflow library being used. I don't know how the API to that works, but Recognize won't work on large photo sets with many many faces. The solution is not to throw memory at the problem. Recognize should be architected to handle libraries like this and not simply allow memory usage to grow unbounded.

Expected Behavior

I expect a recommended tool like Recognize to be stable and well behaved against a wide variety of libraries. If the tool cannot handle a large photo set then it shouldn't be advertised as the facial recognition application for Nextcloud.

To Reproduce

Run Recognize against a large dataset of faces.

Debug log

The only debug that shows what is happening is the "Killed" message when the php occ process is killed and messages in syslog showing that php has been killed due to oom. $ grep Killed /var/log/syslog Nov 17 05:37:10 cloud kernel: [455477.743345] Out of memory: Killed process 83279 (php) total-vm:7361252kB, anon-rss:2855060kB, file-rss:3060kB, shmem-rss:0kB, UID:33 pgtables:14068kB oom_score_adj:0 Nov 17 06:06:27 cloud kernel: [457235.666042] Out of memory: Killed process 83583 (php) total-vm:7514852kB, anon-rss:3070040kB, file-rss:3604kB, shmem-rss:0kB, UID:33 pgtables:14384kB oom_score_adj:0 Nov 17 14:28:35 cloud kernel: [ 2075.671577] Out of memory: Killed process 1534 (php) total-vm:12587752kB, anon-rss:6328024kB, file-rss:2528kB, shmem-rss:0kB, UID:33 pgtables:24308kB oom_score_adj:0 Nov 17 18:17:15 cloud kernel: [15794.989675] Out of memory: Killed process 3104 (php) total-vm:12861156kB, anon-rss:6557152kB, file-rss:3256kB, shmem-rss:0kB, UID:33 pgtables:24800kB oom_score_adj:0

github-actions[bot] commented 1 year ago

Hello :wave:

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at and if possible solved. I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it. Until then, please be patient. Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can collaborate to make this software better. For everyone. Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and try to fix the odd bug yourself. Everyone will be thankful for extra helping hands! One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum, to twitter or somewhere else. But this is a technical issue tracker, so please make sure to focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue Cheers :blue_heart:

marcelklehr commented 1 year ago

Have you tried using the batch size parameter for the cluster-faces command?

deddc23efb commented 1 year ago

No - I didn't see that documented. Currently seeing clustering on 38000 recognized faces failing on 24GB of RAM. Will try batch parameter and report back. Update: Using batch-size gets things working. I'm able to run at batch-size 3000 which is stable at about 1.1GB RAM. Batch-size 5000 started a worrying memory climb again.

I'm currently running the cluster command in a while loop with batch-size 3000 and the outstanding clusters are dropping (from the admin->recognize page.)

It looks like the tool is doing what it should. It would be nice for the guidance on the recognize config page to suggest that batch-size be used with occ cluster-faces command. That might minimize the number of annoyance bug reports raised about out of memory.

Edit again: I've created a Pull request with a small change to the description of the occ cluster-faces command.

marcelklehr commented 1 year ago

Thank you. Closing this for now :)