Closed EVOTk closed 11 months ago
I've added a few photos. But the process remains at the same stage, even though the jobs in the background are running.
Do you have more than 120 new faces? You can check your database with the following query SELECT cluster_id, COUNT(id) from oc_recognize_face_detections GROUP_BY cluster_id;
Thank you for your reply. I use adminer to manage my database.
The command returns an error, should I execute it differently?
Reading other issues, I have seen other commands that can help diagnosis, but I do not know what they correspond to:
Thx
Hello, I'm add ~100 photos. Same issue.
Values for of commands have not changed
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id IS NULL
COUNT(*)
4122
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id = -1
COUNT(*)
6196
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id > 0
COUNT(*)
19202
Hello I've the same problem with different architecture Ubuntu lxc container on proxmox
ok 200 OK 27.0.2.1 yes yes \OC\Memcache\APCu \OC\Memcache\Redis yes \OC\Memcache\Redis no 3634111709184 0.36474609375 0.4296875 0.42724609375 6291456 3680256 6291456 6027264 57 0 6 2356869 23 1 8 14 21 3 0 18 0 0 17 0 0 1 7 2 2 2 7 nginx/1.25.2 8.2.10 1073741824 3600 10737418240 1 136150120 132285336 0 0 67108864 22731392 44377472 92898 2470 4724 130987 79025987 1693939639 0 0 0 0 2519 0 0 99.996812542553 1 1 5 5 6 268435440 267399360 4099 0 12801907 7802 11663 1009 0 1693939639 478816 mmap 1 33554312 33006456 Core date libxml openssl pcre zlib filter hash json random Reflection SPL session standard sodium cgi-fcgi mysqlnd PDO xml apcu bcmath bz2 calendar ctype curl dom mbstring FFI fileinfo ftp gd gettext gmp iconv igbinary imagick intl exif mysqli pdo_mysql Phar posix readline redis shmop SimpleXML sockets sysvmsg sysvsem sysvshm tokenizer xmlreader xmlwriter xsl zip Zend OPcache mysql 10.6.14 1499897856 2 2 4
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<ocs>
<meta>
<status>ok</status>
<statuscode>200</statuscode>
<message>OK</message>
...
</meta>
<data>
<nextcloud>
<system>
<version>27.0.2.1</version>
<theme/>
<enable_avatars>yes</enable_avatars>
<enable_previews>yes</enable_previews>
<memcache.local>\OC\Memcache\APCu</memcache.local>
<memcache.distributed>\OC\Memcache\Redis</memcache.distributed>
<filelocking.enabled>yes</filelocking.enabled>
<memcache.locking>\OC\Memcache\Redis</memcache.locking>
<debug>no</debug>
<freespace>3634111709184</freespace>
<cpuload>
<element>0.36474609375</element>
<element>0.4296875</element>
<element>0.42724609375</element>
...
</cpuload>
<mem_total>6291456</mem_total>
<mem_free>3680256</mem_free>
<swap_total>6291456</swap_total>
<swap_free>6027264</swap_free>
<apps>
<num_installed>57</num_installed>
<num_updates_available>0</num_updates_available>
<app_updates/>
...
</apps>
...
</system>
<storage>
<num_users>6</num_users>
<num_files>2356869</num_files>
<num_storages>23</num_storages>
<num_storages_local>1</num_storages_local>
<num_storages_home>8</num_storages_home>
<num_storages_other>14</num_storages_other>
...
</storage>
<shares>
<num_shares>21</num_shares>
<num_shares_user>3</num_shares_user>
<num_shares_groups>0</num_shares_groups>
<num_shares_link>18</num_shares_link>
<num_shares_mail>0</num_shares_mail>
<num_shares_room>0</num_shares_room>
<num_shares_link_no_password>17</num_shares_link_no_password>
<num_fed_shares_sent>0</num_fed_shares_sent>
<num_fed_shares_received>0</num_fed_shares_received>
<permissions_0_15>1</permissions_0_15>
<permissions_3_17>7</permissions_3_17>
<permissions_3_19>2</permissions_3_19>
<permissions_3_21>2</permissions_3_21>
<permissions_0_31>2</permissions_0_31>
<permissions_3_31>7</permissions_3_31>
...
</shares>
...
</nextcloud>
<server>
<webserver>nginx/1.25.2</webserver>
<php>
<version>8.2.10</version>
<memory_limit>1073741824</memory_limit>
<max_execution_time>3600</max_execution_time>
<upload_max_filesize>10737418240</upload_max_filesize>
<opcache>
<opcache_enabled>1</opcache_enabled>
<cache_full/>
<restart_pending/>
<restart_in_progress/>
<memory_usage>
<used_memory>136150120</used_memory>
<free_memory>132285336</free_memory>
<wasted_memory>0</wasted_memory>
<current_wasted_percentage>0</current_wasted_percentage>
...
</memory_usage>
<interned_strings_usage>
<buffer_size>67108864</buffer_size>
<used_memory>22731392</used_memory>
<free_memory>44377472</free_memory>
<number_of_strings>92898</number_of_strings>
...
</interned_strings_usage>
<opcache_statistics>
<num_cached_scripts>2470</num_cached_scripts>
<num_cached_keys>4724</num_cached_keys>
<max_cached_keys>130987</max_cached_keys>
<hits>79025987</hits>
<start_time>1693939639</start_time>
<last_restart_time>0</last_restart_time>
<oom_restarts>0</oom_restarts>
<hash_restarts>0</hash_restarts>
<manual_restarts>0</manual_restarts>
<misses>2519</misses>
<blacklist_misses>0</blacklist_misses>
<blacklist_miss_ratio>0</blacklist_miss_ratio>
<opcache_hit_rate>99.996812542553</opcache_hit_rate>
...
</opcache_statistics>
<jit>
<enabled>1</enabled>
<on>1</on>
<kind>5</kind>
<opt_level>5</opt_level>
<opt_flags>6</opt_flags>
<buffer_size>268435440</buffer_size>
<buffer_free>267399360</buffer_free>
...
</jit>
...
</opcache>
<apcu>
<cache>
<num_slots>4099</num_slots>
<ttl>0</ttl>
<num_hits>12801907</num_hits>
<num_misses>7802</num_misses>
<num_inserts>11663</num_inserts>
<num_entries>1009</num_entries>
<expunges>0</expunges>
<start_time>1693939639</start_time>
<mem_size>478816</mem_size>
<memory_type>mmap</memory_type>
...
</cache>
<sma>
<num_seg>1</num_seg>
<seg_size>33554312</seg_size>
<avail_mem>33006456</avail_mem>
...
</sma>
...
</apcu>
<extensions>
<element>Core</element>
<element>date</element>
<element>libxml</element>
<element>openssl</element>
<element>pcre</element>
<element>zlib</element>
<element>filter</element>
<element>hash</element>
<element>json</element>
<element>random</element>
<element>Reflection</element>
<element>SPL</element>
<element>session</element>
<element>standard</element>
<element>sodium</element>
<element>cgi-fcgi</element>
<element>mysqlnd</element>
<element>PDO</element>
<element>xml</element>
<element>apcu</element>
<element>bcmath</element>
<element>bz2</element>
<element>calendar</element>
<element>ctype</element>
<element>curl</element>
<element>dom</element>
<element>mbstring</element>
<element>FFI</element>
<element>fileinfo</element>
<element>ftp</element>
<element>gd</element>
<element>gettext</element>
<element>gmp</element>
<element>iconv</element>
<element>igbinary</element>
<element>imagick</element>
<element>intl</element>
<element>exif</element>
<element>mysqli</element>
<element>pdo_mysql</element>
<element>Phar</element>
<element>posix</element>
<element>readline</element>
<element>redis</element>
<element>shmop</element>
<element>SimpleXML</element>
<element>sockets</element>
<element>sysvmsg</element>
<element>sysvsem</element>
<element>sysvshm</element>
<element>tokenizer</element>
<element>xmlreader</element>
<element>xmlwriter</element>
<element>xsl</element>
<element>zip</element>
<element>Zend OPcache</element>
...
</extensions>
...
</php>
<database>
<type>mysql</type>
<version>10.6.14</version>
<size>1499897856</size>
...
</database>
...
</server>
<activeUsers>
<last5minutes>2</last5minutes>
<last1hour>2</last1hour>
<last24hours>4</last24hours>
...
</activeUsers>
...
</data>
...
</ocs>
I did the complete face reset and there be sure that new faces are recognized
The newly detected faces are not shown in the memories app only in Fotos.
And here i thought, hey maybe i should install recognize again? Let's check github issues if this app is usable now!
Nope, same problems like a year ago
And here i thought, hey maybe i should install recognize again? Let's check github issues if this app is usable now!
Nope, same problems like a year ago
Could you please delete your comment? I don't see the need to post it here, if you also have a problem, I think it would be better to detail it, either in this issue if it's similar, or in a new ticket.
Could you please delete your comment? I don't see the need to post it here, if you also have a problem, I think it would be better to detail it, either in this issue if it's similar, or in a new ticket.
Naw thanks, i opened like 20 issues here already
SQL command values are always identical to the last time. https://github.com/nextcloud/recognize/issues/967#issuecomment-1707022983
So, the issue is that face recognition is stuck. Can you check if there are jobs in the oc_jobs table for recognize and maybe post the list? (something like SELECT * from oc_jobs where app = 'recognize';
)
So, the issue is that face recognition is stuck. Can you check if there are jobs in the oc_jobs table for recognize and maybe post the list? (something like
SELECT * from oc_jobs where app = 'recognize';
)
Thx for your reply
Ah, sorry about the wrong query. It looks like the job is intact and should be running, last_run indicates 'Sun Sep 17 04:58:21 2023 UTC'. Can you find any errors in the nextcloud log related to recognize (try something like cat data/nextcloud.log | grep recognize
)?
Here are the attached logs, thank you
I see errors related to "Welcome App" . An update of Welcome is available since 1h ( 1.0.10 ) , I have just made the update. https://apps.nextcloud.com/apps/welcome https://github.com/nextcloud/welcome/blob/main/CHANGELOG.md Are you thinking of a conflict with Welcome App?
Are you thinking of a conflict with Welcome App?
Nah, I think that's unrelated. It's also just warnings. We can ignore those.
I think the reason for this is that we limit the run time of the job to 5 minutes, but by the time the 5minutes are over it still hasn't finished preparing the previews for classification. :/
okay, do you think I should decrease the value of "The number of files to process per job run"? I'm currently at 50, because I'm in WASM mode.
If anything, try reducing it, but I will publish a new release soon with a proper fix that will hopefully solve this for you
okay !, I've just reduced it to "5" to see if there's any change there.
Thanks, I'll be waiting with impatience for this new version :)
Do let me know if the reduced value changes things, to confirm that I've indentified the bug :pray:
Yes yes, of course, I'll come back here to tell you if it works or not.
Chiming in to say that I've had the same issue, and this seems to have fixed it π (I think?)
Lowered (roughly halving the defaults) the number of files to process and it appears to be working well over the past 24h with "Schedule jobs" now having values some of the time and "Last background job execution" showing values for some π
The oc_jobs
table had several entries, but the last_run
was 0 for all but the top one, and last_checked
for the same were "old" timestamps (was just poking around, didn't keep screenshots/logs, might be mis-remembering the exact column).
Whereas now select * from oc_jobs where class like '%Recog%';
seems to show turnover with the jobs clearing themselves and setting up new ones?
I ended up with a large backlog (don't have original numbers, but "Object recognition" has 22,740 queued files right now) because I disabled the app for a few months while I worked out better hardware.
Cheers ππ»
(Edit: Had to drop object/landmark to ~25 since that seems to re-occur once the face recognition queue clears; and this is in native Tensorflow mode, not WASM mode like the parent)
Great! Thanks for letting me know. I'll soon release v5 of recognize which will fix this permanently, so you don't have to reduce the batch size anymore
Reducing the number of "files to process per job run" solved my problem. New faces are identified! Thanks @marcelklehr
Hello @marcelklehr
After lowering the value, and doing a full indexation. New faces have appeared.
however, it's blocked again :( I lowered the value again, but it doesn't change anything.
Mmmh. Did it resolve in the meantime? Sometimes a 12h wait fixes things, if not, can you share the jobs table again?
Hello
Unfortunately, it's still blocked.
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id IS NULL
COUNT(*)
4134
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id = -1
COUNT(*)
6020
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id > 0
COUNT(*)
19440
Thank you for providing the information! At least it seems like the job still exists and runs every now and then for ~23s. Can you check the nextcloud log for errors?
I'm edit sorry @marcelklehr , wrong command
see attached file for recognize log recognize.txt
Mmh, seems like it's still hanging because of the preview generation taking too long :/ Can you try reducing the batch size to 2 (I know this is suboptimal :D )
Thx, i'm try
Mmh, seems like it's still hanging because of the preview generation taking too long :/ Can you try reducing the batch size to 2 (I know this is suboptimal :D )
That doesn't seem to have solved the problem :(
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id IS NULL
COUNT(*)
4134
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id = -1
COUNT(*)
6020
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id > 0
COUNT(*)
19440
Can you try the latest release (v5). It should hopefully finally fix this :)
My setup has the same symptom of not clearing queues for classifiers, even with v5.0.1 installed. Turning on debug logging shows each of the background jobs running, but then being "stalled", with a log entry like this (one for each background job):
{
"reqId": "Cm4um9yQKwppeCKr7gFX",
"level": 0,
"time": "2023-10-11T19:15:41+00:00",
"remoteAddr": "",
"user": "--",
"app": "recognize",
"method": "",
"url": "--",
"message": "Stalling job OCA\\Recognize\\BackgroundJobs\\ClassifyFacesJob with argument array (\n 'storageId' => 1,\n 'rootId' => 1,\n) because other classifiers are already reserved",
"userAgent": "--",
"version": "27.1.1.0",
"data": {
"app": "recognize"
}
}
Can you try the latest release (v5). It should hopefully finally fix this :)
Thanks, I've updated and restarted a full indexing. It will take some time, I'll come back to update then!
My setup has ... each of the background jobs running, but then being "stalled" ...
After seeing where this log statement comes from in the code, I turned on "unlimited concurrency" and now it's chugging away through the queue! It also isn't choking on the jobs, which is the image that "unlimited concurrency" gives me.
Just wanted to check if there were any other updates on the fix. Still encountering the issue on 5.0.1
@aaronm120 Can you send your nextcloug log ideally with loglevel set to debug?
@marcelklehr
Not sure what that "stalling" is, looks like the same issue that others had above
@aaronm120 Aah, I think I know what's going on... :see_no_evil:
@aaronm120 Can you try updating to see if that fixes it?
@marcelklehr Looks like it is now resolved with 5.0.3. Thanks for the fix
Closing for now. Please shout here if this is still happening for you :v:
For my side, I've restarted indexing, but the system is still blocked.
@EVOTk Which version are you on now?
recognize.txt Hello, NC 27.1.2 Recognize 5.0.3
I cleaned up the jobs several days ago, and restarted a complete indexing ( over 40k files ). It takes a long time. Now it seems to be blocked again, unfortunately.
--
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id IS NULL
COUNT(*)
4134
--
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id = -1
COUNT(*)
6009
--
SELECT COUNT(*) from oc_recognize_face_detections WHERE cluster_id > 0
COUNT(*)
19389
Can you up the number of files to process per run?
Would you like set to 50 as originally proposed?
We can do 500 as recommended in the settings, now
Okay, i'm use WASM mode
Ah, right. then 50
I'm still stuck :( What can I do to help me? Thx
Which version of recognize are you using?
4.3.2
Enabled Modes
Face recognition
TensorFlow mode
WASM mode
Downstream App
Memories App
Which Nextcloud version do you have installed?
27.0.227.1.0Which Operating system do you have installed?
Debian 12
Which database are you running Nextcloud on?
MariaDB
10.11.410.11.5Which Docker container are you using to run Nextcloud? (if applicable)
Linuxserver
27.0.227.1.0 PHP 8.2.10How much RAM does your server have?
32 GB
What processor Architecture does your CPU have?
x86_64
Describe the Bug
For the past 1 month, I've had no new face detection on the photos I've added.
the process appears to be "blocked"
"Last classification: 06/08/2023"
occ recognize:recrawl
orocc recognize:classify
not resolve issueI've also tried "clear the classifier queues and clear all background jobs" and relaunching a complete classification, but it always comes back to the same thing.
I didn't run
occ recognize:reset-face-clusters
andocc recognize:reset-faces
because I don't want to lose the current classification.Thx for your help
Expected Behavior
Face detection on new photos
To Reproduce
Debug log
No response