nextcloud / recognize

👁 👂 Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres
https://apps.nextcloud.com/apps/recognize
GNU Affero General Public License v3.0
562 stars 45 forks source link

Prevent shared pictures from being recognized once by user #1111

Open PeggyFree opened 8 months ago

PeggyFree commented 8 months ago

Describe the feature you'd like to request

My nextcloud instance is dedicated to my family. We share a lot of data. There is already a fix for not analyzing several times the pictures from a group folder : https://github.com/nextcloud/recognize/pull/939

My concern is related but not the same : We have a "huge" shared photo library, 200k photos. This library is attached to a virtual "Sharing Account" user, that shares its data with all other people. I didn't use the "group folder" feature because the photo library is located in another drive, so I found this workaround in the nextcloud forum.

Now here is the problem : Recognize is classifying each photo 7 times : once per real user + once for the virtual "sharing account". Not only this is counter productive, but I don't want (need) several users to use recognize, so this is a huge loss of time and energy.

Describe the solution you'd like

Currently the recognize feature is disabled for all users except me but the scan covers all users. As requested below, the scan should only target the enabled users or allow user targeting via a parameter.

https://github.com/nextcloud/recognize/issues/905

Describe alternatives you've considered

Another smart solution woud be to allow the sharing of the recognize data, so that all people from a nextcloud instance would benefit from a single tagging process, and a single manual curing process.

This feature request is described here : https://github.com/nextcloud/recognize/issues/948

github-actions[bot] commented 8 months ago

Hello :wave:

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at and if possible solved. I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it. Until then, please be patient. Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can collaborate to make this software better. For everyone. Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and try to fix the odd bug yourself. Everyone will be thankful for extra helping hands! One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum, to twitter or somewhere else. But this is a technical issue tracker, so please make sure to focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue Cheers :blue_heart:

marcelklehr commented 8 months ago

Recognize is classifying each photo 7 times

What leads you to this conclusion? Recognize should not process shared files multiple times per sharee. If this is the case then this is a bug. Otherwise this may likely be a duplicate of https://github.com/nextcloud/recognize/issues/948 (please check it out )

PeggyFree commented 8 months ago

Well, thank you for your fast answer.

I'm struggling having the whole pics analyzed naturally, the "pending jobs" reported in the "Personnal Settings -> Recognize" tab are never triggered. This morning it was indicating 65000 files to be classified and 125000 files to be clustered. This afternoon I could force the classify process by using the following command :

while true; do PHP_MEMORY_LIMIT=4096M php occ recognize:cluster-faces --batch-size=1000; done

By doing so, I could see in the console that the job kept cycling through each users by 500 item each time, during 9 hours, until the end. Now the Recognize tab keep saying (translated) :

Classification : 66578 Files queued, Last classified: 2 days ago, Classification jobs scheduled: 1, Classification job last run: 10 hours ago

Clustering : nothing left

So I have just launched the "recognize:classify", it seems it scans the whole 200k files again, I don't really understand why the remaining job is not processing naturally, the log doesn't show any error. EDIT : after 7h working it stopped. Now I have a remaining job of 66398 faces instead of the previous 66578. Sorry for mixing several subjects, I'm not super pro at linux, nextcloud, and using Docker Nextcloud also brings additionnal minor problems.

marcelklehr commented 8 months ago

So, there's background jobs and there's the command line commands. You can run the cluster-faces command to work on unclustered faces, but you cannot run the classify command to work on queued photos for classification. The queue is tied to background jobs. So either you setup background jobs in nextcloud correctly and then wait for background jobs to finish processing queued files or you clear all queues and run the classify command to take care of everything for you. As you are doing it currently you are doing the same work multiple times.

Generally, if the last background job ran not too long ago and the last classification was not too long ago, the system is working and you will simply have to wait for your 60000 photos to churn through it. Unless you don't have 60000 images at all, that would be a bug again.

PeggyFree commented 7 months ago

@marcelklehr I followed your advice and added CRON jobs to trigger classify. Now I have this "normal" notification :

Face recognition: 0 Queued files, Last classification: 2 hours ago, Scheduled background jobs: 0, Face clustering: 0 faces left to cluster, Last clustering run: 2 hours ago, Scheduled background jobs: 0,

But there are still behaviours I don't understand. Last 15 days uploaded photos were not visible in the assigned faces, nor in the unassigned faces. I just browsed back and forth between the "people" url, opened several people I was expecting to be recognized, and suddenly I could see the latest photos in the "unassigned faces" list from memories (as the same menu from the "photo" module is doing timeout). What I don't understand, is that the face clustering batch still announces "0 faces left to cluster", while the unassigned recent faces show my main family members which I know Recognize is able to cluster. The weird thing is that there was 0 cluster action over the last 50 pics, which makes me think that the cluster process is faulty in some way.

marcelklehr commented 6 months ago

Photos in "Unassigned faces" are all after clustering. That means clustering couldn't assign them to a cluster.