matiasdelellis / facerecognition

Nextcloud app that implement a basic facial recognition system.
GNU Affero General Public License v3.0
514 stars 45 forks source link

No faces has been recognized - memory settings #97

Closed SlavikCA closed 5 years ago

SlavikCA commented 5 years ago

I ran face recognition on few of my photos here: https://cloud.aidcim.org/index.php/apps/gallery/s/6NFCGmAkoZL6Pc5

That's on

The job completes:

sudo -u www-data php occ -v face:background_job -u ****
1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
        * have 1000 faces already processed (you have 0),
        * have 100 images (you have 67),
        * or you need to have 95% of you images processed (you have 0.00%)
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Finding missing images for user google-106379083885103545634
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180530_090646.jpg
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180531_104813.jpg
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180531_104832.jpg

...

yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Penasco/IMG_20180602_175238.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Penasco/IMG_20180602_104556.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Penasco/IMG_20180602_160705.jpg
8/8 - Executing task UnlockTask (Release obtained lock)

But the "Settings => Face Recognition" page still empty:

Your friends have not been recognized yet
Please, be patient

When I run it again, the job gets completed in a second:

sudo -u www-data php occ -v face:background_job -u google-106379083885103545634
1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        0 faces found for clustering
        0 clusters found for clustering
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Skipping full image scan for user google-106379083885103545634
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
8/8 - Executing task UnlockTask (Release obtained lock)

So, what can be reason, that I don't see any faces? Is there a way to troubleshoot? Is there a command to get additional stats / details?

stalker314314 commented 5 years ago

Do you have any faces found? E.g. do you have any entries in oc_facerecognition_faces? Seems as if first pass didn't found any face on those images? (Which is strange, as there are faces). You can try running command with -vvv to get more debugging output. If you want to start from the start, you neef to purge some tables oc_facerecognition_(persons|faces|images).

Additional question: how long does first pass (one when it is processing images) takes? It should take substantial amount of time (unless you have GPU and you compiled dlib for GPU) - 30-120sec/image. That would help me pinpoint where to put more traces, so we can figure it out! You do have at least 2GB of RAM and memory limit for PHP (cli) is at least 1GB?

P.S. it is OK to run command multiple times, it is designed to continously run with cron

stalker314314 commented 5 years ago

Closed by mistake

SlavikCA commented 5 years ago

I don't have GPU. NextCloud runs in VMs, which has 2 cores of i7 CPU and 6GB of RAM. PHP is version 7.0 and CLI has no memory limit:

slavik@ubhome:~$ php -r "phpinfo();" | grep memory
memory_limit => -1 => -1

While running face:background_job, I see each photo takes about 30 seconds of processing time.

SELECT * FROM oc_face_recognition_faces MySQL returned an empty result set (i.e. zero rows).

SELECT * FROM oc_face_recognition_persons MySQL returned an empty result set (i.e. zero rows).

SELECT * FROM oc_face_recognition_images MySQL returned 147 rows

[{"id":"2586","user":"google-106379083885103545634","file":"2539304","model":"1","is_processed":"1",
"error":"std::bad_alloc","last_processed_time":"2019-01-06 09:54:53","processing_duration":"0"},
 {"id":"2587","user":"google-106379083885103545634","file":"2539305","model":"1","is_processed":"1",
"error":"std::bad_alloc","last_processed_time":"2019-01-06 09:38:23","processing_duration":"0"},
 {"id":"2588","user":"google-106379083885103545634","file":"2539306","model":"1","is_processed":"1",
"error":"std::bad_alloc","last_processed_time":"2019-01-06 09:59:05","processing_duration":"0"},
 ...

So, every row has "error":"std::bad_alloc" What does that mean? Why it doesn't show in any log?

SlavikCA commented 5 years ago

While using -vvv - I don't see any additional output:

root@ubhome:/var/www/nextcloud# sudo -u www-data php /var/www/nextcloud/occ -vvv face:background_job -u google-106379083885103545634                                                           1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        0 faces found for clustering
        0 clusters found for clustering
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Skipping full image scan for user google-106379083885103545634
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/phonePix (3).jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/10/20181013_103640_HDR.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/E-cell-cam.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_115950.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/phonePix (4).jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09/from Masha.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_163336.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/10/20181010_170911_HDR.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_163417.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/phonePix (2).jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_163407.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_161152_HDR.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/DSC_2597.JPG
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/10 BC camp/20181006_094015.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_162300.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/10 BC camp/20181005_183018_HDR.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09/20180923_182941.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/10/20181012_160032.jpg
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09/20180923_174923.jpg
8/8 - Executing task UnlockTask (Release obtained lock)
stalker314314 commented 5 years ago

Aha, ok, I see. This is worse case of #63. So, you don't have enough memory for dlib, PHP process do not crash due to OOM (so you would know if something fishy is going on), and we get bad_alloc that we don't treat specially (and we should quit execution of current command and tell user that there is not enough memory). We have some checks to add and in the meantime - no other workaround for you other than increasing VM memory (2GB should be min). If that doesn't help on its own (or you already have that), increase php memory limits to 1gb. You should have no errors in images table (and truncate these 3 tables before retrying!)

SlavikCA commented 5 years ago

I have 6GB RAM on VM.

And I see the php process sometimes takes 3GB+ of it:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
50604 www-data  20   0 4845816 3.731g  51056 R 130.6 64.1  38:35.12 php

I'll try to set memory limit to 1GB and see, if it works more reliably, than no limit

SlavikCA commented 5 years ago

So, I truncated 3 tables:

truncate oc_face_recognition_faces;
truncate oc_face_recognition_persons;
truncate oc_face_recognition_images;

However, face:background_job doesn't re-scan images. It saying: Skipping full image scan for user pix

What else do I need reset to try?

Here is full output:

sudo -u www-data php /var/www/nextcloud/occ -vvv face:background_job -vvv -u pix
1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
        * have 1000 faces already processed (you have 0),
        * have 100 images (you have 0),
        * or you need to have 95% of you images processed (you have 0.00%)
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Skipping full image scan for user pix
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
8/8 - Executing task UnlockTask (Release obtained lock)
root@ubhome:/var/www/nextcloud# sudo -u www-data php /var/www/nextcloud/occ -vvv face:background_job -vvv -u pix
1/8 - Executing task CheckRequirementsTask (Check all requirements)
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
        * have 1000 faces already processed (you have 0),
        * have 100 images (you have 0),
        * or you need to have 95% of you images processed (you have 0.00%)
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Skipping full image scan for user pix
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
8/8 - Executing task UnlockTask (Release obtained lock)
SlavikCA commented 5 years ago

I think this prevented me from re-scanning: SELECT * FROM oc_preferences WHERE userid="pix" AND configkey="full_image_scan_done" [{"userid":"pix","appid":"facerecognition","configkey":"full_image_scan_done","configvalue":"true"}]

so, I did: DELETE FROM oc_preferences WHERE userid="pix" AND configkey="full_image_scan_done"

I don't understand the logic for that flag.

Anyway, would be useful to have some command, which will force re-scanning images for the user.

SlavikCA commented 5 years ago

I set PHP memory_limit = 4G VM has 6GB

And restarted scanning.

And I still see std::bad_alloc in DB. The PHP log shows no errors.

stalker314314 commented 5 years ago

I forgot about that flag, cool you figured it out:) sorry about this. Logic of the flag is to do full scan only once (after that point, we rely on "hooks"). BTW, on the unrelated note, I am having PR to split it per user.

Regarding bad_alloc, I am clueless. I will add more tracing to determine what code thinks memory you have and how much it is resizing images. I guess your VM is not doing anything else? But, once I am back from vacation (~week) :)

stalker314314 commented 5 years ago

OK, I just pushed small change to master that adds more traces. Can you pull latest master and do usual DB cleanup (along with full_image_scan_done trick), just to see why you are getting OOM (run command with -v as those are all debug logs). In task 1/8, you should see how much memory are we detecting. Also, before images are given to dlib, they are resized. There are logs telling you how much we are resizing them (feeding bigger images to dlib will result in OOM). Output you are looking for are these lines:

1/8 - Executing task CheckRequirementsTask (Check all requirements)
        Found 10213 MB available to PHP.
        Capping used memory to maximum of 4GB
...
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
yielding
        Processing image /home/branko/server/data/branko/files/20180907_162635.jpg
        Image scaled from 2095x1502 to 2845x2040 (since max image area is 3145728 pixels^2)
        Faces found 3

This will at least tell us what values are you resizing to.

Other option to tweak is constant 0.75 found here. This is formula to get to maximum area that we can feed to dlib, based on memory given to PHP (empirically taken from work done here ). Try lowering this value to 0.66, 0.5 and see if you still get OOM.

Again, your VM is not doing anything else (it is not having machine-wide memory pressure)?

Sorry about your bad experience with this app, but I hope that, with your help, we can get to bottom of this!:)

SlavikCA commented 5 years ago

The VM has only NextCloud with few apps for it.
There is no significant traffic to it.

Tried latest changes.

Tried with '0.75', '0.4': same failure - std::bad_alloc

It only started to work with '0.2' value.

That formula seems strange to me.
I expect it would downscale image, so it can fit into memory.
But it often upscale it. Why?

Log for 0.4 value:

sudo -u www-data php /var/www/nextcloud/occ -vvv face:background_job -vvv -u google-****
1/8 - Executing task CheckRequirementsTask (Check all requirements)
        Found 4096 MB available to PHP.
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
        * have 1000 faces already processed (you have 0),
        * have 100 images (you have 0),
        * or you need to have 95% of you images processed (you have 0.00%)
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Finding missing images for user google-106379083885103545634
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180530_090646.jpg
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180531_104813.jpg
...
        Found /google-106379083885103545634/files/Photos/10 BC camp/20181006_094015.jpg
        Found /google-106379083885103545634/files/Photos/20180907_162635.jpg
yielding
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Shemereko reunion/phonePix (4).jpg
        Image scaled from 1496x1122 to 1280x960 (since max image area is 1677721 pixels^2)
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/08 camp/20180811_162621_HDR.jpg
        Image scaled from 1496x1122 to 4656x3492 (since max image area is 1677721 pixels^2)
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/09 Oceanshore/20180907_102605.jpg
        Image scaled from 1496x1122 to 4160x3120 (since max image area is 1677721 pixels^2)

Log for 0.2 value:

sudo -u www-data php /var/www/nextcloud/occ -vvv face:background_job -vvv -u google-****
1/8 - Executing task CheckRequirementsTask (Check all requirements)
        Found 4096 MB available to PHP.
2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/8 - Executing task LockTask (Acquire lock so that only one background task can run)
4/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
        Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
        * have 1000 faces already processed (you have 0),
        * have 100 images (you have 0),
        * or you need to have 95% of you images processed (you have 0.00%)
5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
        Finding missing images for user google-106379083885103545634
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180530_090646.jpg
        Found /google-106379083885103545634/files/Photos/05 Penasco/IMG_20180531_104813.jpg
...
        Found /google-106379083885103545634/files/Photos/10 BC camp/20181006_094015.jpg
        Found /google-106379083885103545634/files/Photos/20180907_162635.jpg
yielding
6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/08 camp/20180812_100035.jpg
        Image scaled from 1058x793 to 4160x3120 (since max image area is 838860 pixels^2)
        Faces found 5
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Pioneer village/b4.jpg
        Image scaled from 1122x748 to 1280x853 (since max image area is 838860 pixels^2)
        Faces found 7
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Penasco/IMG_20180602_104021.jpg
        Image scaled from 687x1221 to 1080x1920 (since max image area is 838860 pixels^2)
        Faces found 2
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/08 camp/20180812_101745.jpg
        Image scaled from 1058x793 to 4656x3492 (since max image area is 838860 pixels^2)
        Faces found 4
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Penasco/IMG_20180602_105324.jpg
        Image scaled from 1058x793 to 4032x3024 (since max image area is 838860 pixels^2)
        Faces found 1
yielding
        Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/05 Pioneer village/l2.JPG
        Image scaled from 1122x748 to 5472x3648 (since max image area is 838860 pixels^2)
        Faces found 1
stalker314314 commented 5 years ago

Ok, I need more help:) can you tell me (if you still know it) what dlib version you have, how you obtained it (how you compiled it?), and how you obtained pdlib? I suspect you maybe compiled it on your own and didn't had some optimizations like openblas or similar...

SlavikCA commented 5 years ago

Originally I installed dlib by following the guide here: https://github.com/goodspb/pdlib

However, at that time, it didn't have info about libx11-dev and libopenblas-dev, so I didn't have them on my system. After I figured that dependencies out, I submitted PR for their README (it's merged now). And I rebuilt my dlib and pdlib. So, I believe I have openblas on my Ubuntu 16 now and dlib is built with it.

Of course, it would be nice, if I can just install dlib via apt install, but I had to compile it myself, as per their guide.

Also, is it expected, that in some scenarios image is upscaled? Like this:

Processing image /var/www/nextcloud_data/google-106379083885103545634/files/Photos/08 camp/20180812_101745.jpg
Image scaled from 1058x793 to 4656x3492 (since max image area is 838860 pixels^2)
stalker314314 commented 5 years ago

So, now with openblas and other stuff, does it works correctly (does it work if you don't change this 0.75 constant)? Just to confirm that this was causing issues, so we can write it in some FAQ later?

Of course, it would be nice, if I can just install dlib via apt install, but I had to compile it myself, as per their guide.

I think #86 will fix this. I am not expert in debian packaging, but it doesn't make sense to have binaries for pdlib, without binaries for dlib:) Maybe @matiasdelellis plans to do some magic with static linking, but - this is also good for end user as it will not have to compile dlib in that case too either. So, expect #86 to fix this (and current thinking is that this should be blocker for initial release, to lower friction to use this app)

Also, is it expected, that in some scenarios image is upscaled?

Yes, it is expected. Even dlib CNN example (that dets = cnn_face_detector(img, 1) line) is upscaling image once (doubling its size with this 1 argument). Seems that quality of detection and number of detected faces is far better. In our own testing, there is section with this analysis. If you don't bother reading, just look at this image: https://github.com/stalker314314/dlib-cnn-face-detection-analysis/blob/master/assets/faces-vs-upsample.png That's why we decided to upscale (if user have 8GB of RAM and can support images up to 4 Mpixels, and image is 1Mpixel, why not double the size:D)

In any case, I think upscaling is good because it increases detection quality. However, it also increases detection time. @matiasdelellis also complained about that in #85. I think solution is somewhere in between and I will propose it in separate issue - upscale, but only up to 2x. If there is room for 10x, just don't do it, user will probably not benefit from it, as picture is probably small enough.

SlavikCA commented 5 years ago

So, now with openblas and other stuff, does it works correctly (does it work if you don't change this 0.75 constant)? Just to confirm that this was causing issues, so we can write it in some FAQ later?

Without libx11-dev and libopenblas-dev i was unable to install pdlib.

After I installed libx11-dev, libopenblas-dev and recompiled dlib & pdlib - I was getting std::bad_alloc.

It only started to work after I lowered that constant to 0.2.

So, to summarize:
With libopenblas-dev installed, face recognition with constant 0.75 doesn't work on 6GB system Ubuntu system.

stalker314314 commented 5 years ago

Unfortunately, I am out of ideas. If we couldn't figure out why it is working for Matias and me, and not for you, then we surely cannot expect it will work for general population. We would have to add additional parameter to control max image size directly (I mean, I planned to write that anyway, but we will have to expose it as first-class parameter and explain it thoroughly).

Only one thing that comes to my mind is to try to replicate your setup. If you can share some steps (VM hardware, ubuntu version, additional packages, procedure to install nextcloud, dlib...), I promise I will take some time to replicate it. If you can somehow share whole img/vhd file, even better (there is my mail in github profile, change passwords and other stuff etc.)