matiasdelellis / facerecognition

Nextcloud app that implement a basic facial recognition system.
GNU Affero General Public License v3.0
514 stars 45 forks source link

There is no clustering until I analyze all the photos of all the users. #77

Closed matiasdelellis closed 5 years ago

matiasdelellis commented 5 years ago

I just want to enable a user for tests but until I analyze all the users, grouping is not done... :disappointed:

On the other hand the task CreateClustersTask is executed before ImageProcessingTask, therefore, all the photos must be analyzed, and then launch it again to do the grouping.. :wink:

stalker314314 commented 5 years ago

So, you create a user, add some images to his storage and fire command (with --user_id). And when all images are processed and all faces are created, and you start again command (with --user_id again) and it is not creating clusters? Seems like a bug? Can you take a look at these magic numbers: https://github.com/matiasdelellis/facerecognition/blob/master/lib/BackgroundJob/Tasks/CreateClustersTask.php#L173 I would suspect this "if" is preventing you to continue.

Or, maybe I misunderstood you - you are saying that this command is required to be executed 2 times to get results? If this is case, I would say it is default behavior (by-design). Because I assumed that command will be run from cron (or even with NC scheduled tasks in far future:D) and that it will be time bound. I mean, this was idea? User adds cron (*/15 * * * occ facerecognition:background_job -t 900) and it will continously produce results.

Or you are asking how to create automated test that will trigger cluster creation? I guess, we will need to export all those "magic numbers" to configs to give us ways to influence when tasks are done? Also, mocks for getting any config value will help.

Please, can you clarify?:)

matiasdelellis commented 5 years ago

After 6 hours of analysis of my photos, I can make a summary... :wink:

I start from an empty database. I analyze all the photos of 1 user passing --user_id, I can execute it as many times as you want, there is never grouping. :disappointed:

FacesCount: 923 ImageCount: 1249 -All processed from one user- percentImagesProcessed: 100% of one user...

I have to add 77 faces?. :confused:

Well, then execute the command without --user option one time. Ends with the analysis of all users correctly.

FacesCOunt: 980 ImageCount: 1302 percentageProcessed: 100% on all users..

There is no grouping. According to you, that's how it's designed. It's weird, but Ok. :sweat_smile:

Run again to all users.. and work.. :smile:

        923 faces found for clustering
        186 clusters found for clustering
        57 faces found for clustering
        22 clusters found for clustering

Well, should not be able to group the photos of just one person?. Think backwards. The users should not have the option to deny the analysis of their photos?

stalker314314 commented 5 years ago

No, this is definitevely not OK. Every time you re-run command, you should get output from here: https://github.com/matiasdelellis/facerecognition/blob/master/lib/BackgroundJob/Tasks/CreateClustersTask.php#L174 if you (one user) have either >100 images processed, or >1000 faces processed, or >95% processed, clusters should be created. And this function is called, no matter if you have --user_id or not. And you have >100 images = 1249.

I would say this is a bug. I will take a look.

My current thinking is that your $imageCount is somehow always 0...but not sure

matiasdelellis commented 5 years ago

So, basically, when running with --user_id, you never get clusters? (even if 100% of images are processed)

Yes.. 100% of the user photos.

And you ran command multiple times during those 6 hours?

No.. I run it once and for all photos of a user, it takes 6 hours. Then I run many times, and never does the clustering.

BTW, if you have logs (output from console) when you were running with --user_id that will help!

You're right.... :disappointed: In the next attempt I share it. :wink:

matiasdelellis commented 5 years ago

p.s: I suppose it has to do with the variable FULL_IMAGE_SCAN_DONE_KEY

And now looking for the code, I think you did it deliberately.

https://github.com/matiasdelellis/facerecognition/blob/6b83b55a594fbcee05b2b869846d6971f34b615f/lib/BackgroundJob/Tasks/AddMissingImagesTask.php#L101-L103

stalker314314 commented 5 years ago

Yes, it have. Now looking at checks that are placed in https://github.com/matiasdelellis/facerecognition/blob/6b83b55a594fbcee05b2b869846d6971f34b615f/lib/BackgroundJob/Tasks/CreateClustersTask.php#L165, I am thinking we don't need this guard for FULL_IMAGE_SCAN_DONE_KEY. Idea was that, if FULL_IMAGE_SCAN_DONE_KEY is false => we still don't know image count => $imageCount is wrong=> logic to determine if we should create cluster is wrong => created cluster can be small and unstable.

But, with those checks (>100 images, >1000 faces, >95%), I think we could be fine.

Other option is to place some OR condition there (if FULL_IMAGE_SCAN_DONE_KEY || user_id is set), but this is wrong (gives wrong sense of security). Better is that cluster creation logic is done per user (no matter if all images are inserted).

Other than that, I cannot think of any other reason to add this FULL_IMAGE_SCAN_DONE_KEY check.

Expect some CR tonight