manticoresoftware / manticoresearch-backup

Repository for Manticore Search backup scripts
GNU General Public License v3.0
3 stars 1 forks source link

manticore-backup Failed to send query to the Manticore Search daemon #116

Open srgtest opened 2 months ago

srgtest commented 2 months ago

Update Jul 9 2024

Likely MRE - https://github.com/manticoresoftware/manticoresearch-backup/issues/116#issuecomment-2216597206


Package versions:

manticore-backup/jammy,now 1.3.8-24052208-57fc406 all [установлен, автоматически]
manticore-buddy/jammy,now 2.3.10-24052208-7612a4f all [установлен, автоматически]
manticore-columnar-lib/jammy,now 2.3.0-24052206-88a01c3 amd64 [установлен]
manticore-common/jammy,now 6.3.0-24052209-1811a9efb all [установлен, автоматически]
manticore-dev/jammy,now 6.3.0-24052209-1811a9efb all [установлен, автоматически]
manticore-executor/jammy,now 1.1.6-24052206-c55bc2b amd64 [установлен]
manticore-galera/jammy,now 3.37 amd64 [установлен]
manticore-icudata-65l/jammy,now 5.0.3-221123-d2d9e5e56 all [установлен, автоматически]
manticore-repo/now 0.0.4 all [установлен, локальный]
manticore-server-core/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-server/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-tools/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-tzdata/jammy,now 1.0.0-240522-a8aa66e all [установлен, автоматически]
manticore/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен]

When running the manticore-backup command:

manticore-backup --config=/etc/manticoresearch/manticore.conf --backup-dir=/mnt/backup --compress

An error appears in the log:

Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: all tables
Backup dir: /mnt/backup

Manticore config
  endpoint =  http://10.9.2.99:9312

Manticore versions:
  manticore: 6.3.0 1811a9efb@24052209
  columnar: 2.3.0 88a01c3@24052206
  secondary: 2.3.0 88a01c3@24052206
  knn: 2.3.0 88a01c3@24052206
  buddy: 2.3.10
2024-06-05 06:31:52 [Info] Starting the backup...
2024-06-05 06:32:06 [Info] Backing up config files...
2024-06-05 06:32:37 [Info]   config files - OK
2024-06-05 06:32:37 [Info] Backing up global state files...
2024-06-05 06:32:37 [Info]   global state files – OK
2024-06-05 06:32:41 [Info] Backing up tables...
2024-06-05 06:32:41 [Info]   bindings (rt) [1.52K]...
2024-06-05 06:32:52 [Info]    OK
2024-06-05 06:32:52 [Info]   categories (rt) [68.683K]...
2024-06-05 06:32:53 [Info]    OK
2024-06-05 06:32:53 [Info]   ekns (rt) [820B]...
2024-06-05 06:33:06 [Info]    OK
2024-06-05 06:33:06 [Info]   literature_work_cycle_volumes (rt) [506.537K]...
2024-06-05 06:33:29 [Info]    OK
2024-06-05 06:33:29 [Info]   literature_work_cycles (rt) [3.749M]...
2024-06-05 06:33:31 [Info]    OK
2024-06-05 06:33:31 [Info]   persons (rt) [29.019M]...
2024-06-05 06:33:32 [Info]    OK
2024-06-05 06:33:32 [Info]   product_collections (rt) [212.906K]...
2024-06-05 06:33:33 [Info]    OK
2024-06-05 06:33:33 [Info]   product_properties_values (rt) [866.989K]...
2024-06-05 06:33:33 [Info]    OK
2024-06-05 06:33:33 [Info]   product_tags (rt) [61.813K]...
2024-06-05 06:33:34 [Info]    OK
2024-06-05 06:33:34 [Info]   products (rt) [2.47G]...
2024-06-05 06:36:07 [Info]    OK
2024-06-05 06:36:07 [Info]   products_for_autocorrect (rt) [288.336M]...
2024-06-05 06:36:46 [Info]    OK
2024-06-05 06:36:46 [Info]   publisher_brands (rt) [220.358K]...
2024-06-05 06:36:47 [Info]    OK
2024-06-05 06:36:47 [Info]   publisher_series (rt) [9.315M]...
2024-06-05 06:36:48 [Info]    OK
2024-06-05 06:36:48 [Info]   publishers (rt) [1.151M]...
2024-06-05 06:36:51 [Info]    OK
2024-06-05 06:36:51 [Info]   retail_products (rt) [36.267G]...
2024-06-05 07:08:34 [Info]    OK
2024-06-05 07:08:34 [Info]   school_properties_values (rt) [258.705K]...
2024-06-05 07:08:53 [Info]    OK
2024-06-05 07:08:53 [Info]   shops (rt) [77.063K]...
2024-06-05 07:09:17 [Info]    OK
2024-06-05 07:09:17 [Info]   tbks (rt) [434.906K]...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info] Running sync
2024-06-05 07:09:47 [Info]  OK
2024-06-05 07:09:47 [Info] You can find backup here: /mnt/backup/backup-20240605063152
2024-06-05 07:09:47 [Info] Elapsed time: 2275.05s
2024-06-05 07:09:47 [Info] Done
2024-06-05 07:09:47 [Info] 
Unfreezing all tables...
2024-06-05 07:09:47 [Info]   bindings...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   categories...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   ekns...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   literature_work_cycle_volumes...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   literature_work_cycles...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   persons...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   product_collections...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   product_properties_values...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   product_tags...
2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   products...

Fatal error: Uncaught Manticoresearch\Backup\Exception\SearchdException: Failed to send query to the Manticore Search daemon. Ensure that it is set up to listen for HTTP or HTTPS connections and has the appropriate certificates in place. Additionally, check the 'max_connections' setting in
 the configuration file to ensure that it has not been exceeded.  in /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php:27
9
Stack trace:
#0 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(145): Manticoresearch\Backup\Lib\ManticoreClient->execute('UNFREEZE 
produc...')
#1 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(160): Manticoresearch\Backup\Lib\ManticoreClient->unfreeze(Array)
#2 [internal function]: Manticoresearch\Backup\Lib\ManticoreClient->Manticoresearch\Backup\Lib\{closure}(true, 'products')
#3 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(157): array_reduce(Array, Object(Closure), true)
#4 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreBackup.php(166): Manticoresearch\Backup\Lib\ManticoreClient->unfreezeAll()
#5 [internal function]: Manticoresearch\Backup\Lib\ManticoreBackup::Manticoresearch\Backup\Lib\{closure}(Object(Manticoresearch\Backup\Lib\Mantic
oreClient))
#6 {main}
  thrown in /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php on line 279

As you can see, the backup itself went through, but at the UNFREEZE stage something went wrong. This happens periodically. At the same time, Manticore itself hangs and accepts more requests, except for a local connection. Solved by:

systemctl stop manticore
systemctl start manticore

searchd.log:

[Wed Jun  5 06:24:00.115 2024] [3060672] rt: table retail_products: diskchunk 50508(83), segments 1  saved in 2.774805 (2.775184) sec, RAM saved/new 33993065/0 ratio 0.900000 (soft limit 120795955, conf limit 134217728)
[Wed Jun  5 06:24:04.316 2024] [3060673] rt: table retail_products: diskchunk 50507(84), segments 5  saved in 13.336653 (13.385911) sec, RAM saved/new 158546614/31004605 ratio 0.836432 (soft limit 112263938, conf limit 134217728)
[Wed Jun  5 06:24:05.555 2024] [3060672] rt: table retail_products: diskchunk 50509(85), segments 1  saved in 2.599324 (2.599727) sec, RAM saved/new 31004605/0 ratio 0.786432 (soft limit 105553051, conf limit 134217728)
[Wed Jun  5 06:24:33.045 2024] [3060673] rt: table retail_products: diskchunk 50512(83), segments 3  saved in 9.493652 (9.494065) sec, RAM saved/new 111616578/43061952 ratio 0.721604 (soft limit 96851990, conf limit 134217728)
[Wed Jun  5 06:24:35.781 2024] [3060671] rt: table retail_products: diskchunk 50513(84), segments 2  saved in 3.717802 (3.718218) sec, RAM saved/new 43061952/0 ratio 0.671604 (soft limit 90141104, conf limit 134217728)
[Wed Jun  5 06:25:00.201 2024] [3060670] rt: table retail_products: diskchunk 50515(85), segments 3  saved in 8.604319 (8.604775) sec, RAM saved/new 101979195/30010024 ratio 0.772633 (soft limit 103701014, conf limit 134217728)
[Wed Jun  5 06:25:23.237 2024] [3060672] rt: table retail_products: diskchunk 50517(84), segments 5  saved in 9.112808 (9.113299) sec, RAM saved/new 106337048/9306 ratio 0.950000 (soft limit 127506841, conf limit 134217728)
[Wed Jun  5 06:25:30.767 2024] [3060672] WARNING: got ident_all
[Wed Jun  5 06:25:30.870 2024] [3060671] WARNING: rt common merge: table retail_products: failed to merge /var/lib/manticore/retail_products/retail_products.50512 (interrupted because of shutdown)
[Wed Jun  5 06:25:30.870 2024] [3060670] rt: table retail_products: optimized progressive chunk(s) 170 ( left 84 ) in 47m 55.8s
[Wed Jun  5 06:25:30.938 2024] [3060673] rt: table retail_products: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.009 Mb, time delta=7 sec, took=0.068 sec)
[Wed Jun  5 06:25:33.952 2024] [3060674] rt: table products: diskchunk 104352(13), segments 15  saved in 5.550124 (5.550515) sec, RAM saved/new 133089609/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)
[Wed Jun  5 06:25:45.033 2024] [3060671] rt: table products: optimized progressive chunk(s) 13 ( left 12 ) in 11s 78ms
[Wed Jun  5 06:27:36.865 2024] [3060671] rt: table products: diskchunk 104354(13), segments 13  saved in 5.114076 (5.152566) sec, RAM saved/new 134260091/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)
[Wed Jun  5 06:27:44.799 2024] [3060670] rt: table products: optimized progressive chunk(s) 13 ( left 12 ) in 7s 930ms
[Wed Jun  5 06:29:45.454 2024] [3060670] rt: table products: diskchunk 104356(13), segments 13  saved in 5.459832 (5.460300) sec, RAM saved/new 136490137/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)
[Wed Jun  5 06:29:57.752 2024] [3060670] rt: table products: optimized progressive chunk(s) 13 ( left 12 ) in 12s 294ms
[Wed Jun  5 06:31:51.941 2024] [3060673] WARNING: got ident_all
[Wed Jun  5 06:31:54.358 2024] [3060671] rt: table products: diskchunk 104358(13), segments 13  saved in 5.617763 (5.618163) sec, RAM saved/new 134159092/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)
[Wed Jun  5 06:32:06.360 2024] [3060672] rt: table products: optimized progressive chunk(s) 13 ( left 12 ) in 11s 999ms
[Wed Jun  5 06:32:37.823 2024] [3060671] WARNING: got ident_all
[Wed Jun  5 06:32:37.827 2024] [3060671] last message repeated 17 times
[Wed Jun  5 06:32:37.827 2024] [3060671] rt: table bindings: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.001 Mb, time delta=3431 sec, took=0.003 sec)
[Wed Jun  5 06:32:37.828 2024] [3060673] rt: table categories: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.075 Mb, time delta=3431 sec, took=0.000 sec)
[Wed Jun  5 06:32:37.831 2024] [3060673] rt: table ekns: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.000 Mb, time delta=3429 sec, took=0.002 sec)
[Wed Jun  5 06:32:37.846 2024] [3060674] rt: table literature_work_cycle_volumes: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.562 Mb, time delta=3431 sec, took=0.015 sec)
[Wed Jun  5 06:32:37.923 2024] [3060670] rt: table literature_work_cycles: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=3.016 Mb, time delta=3431 sec, took=0.077 sec)
[Wed Jun  5 06:32:38.190 2024] [3060673] rt: table persons: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=31.319 Mb, time delta=3431 sec, took=0.266 sec)
[Wed Jun  5 06:32:38.195 2024] [3060672] rt: table product_collections: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.226 Mb, time delta=3431 sec, took=0.005 sec)
[Wed Jun  5 06:32:38.247 2024] [3060672] rt: table product_properties_values: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.984 Mb, time delta=3431 sec, took=0.052 sec)
[Wed Jun  5 06:32:38.252 2024] [3060672] rt: table product_tags: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.072 Mb, time delta=3432 sec, took=0.003 sec)
[Wed Jun  5 06:32:38.303 2024] [3060672] rt: table products: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=49.014 Mb, time delta=43 sec, took=0.050 sec)
[Wed Jun  5 06:32:40.820 2024] [3060671] rt: table products_for_autocorrect: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=114.741 Mb, time delta=3433 sec, took=2.507 sec)
[Wed Jun  5 06:32:40.843 2024] [3060671] rt: table publisher_brands: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.263 Mb, time delta=3434 sec, took=0.007 sec)
[Wed Jun  5 06:32:40.959 2024] [3060671] rt: table publisher_series: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=10.670 Mb, time delta=3434 sec, took=0.115 sec)
[Wed Jun  5 06:32:40.987 2024] [3060674] rt: table publishers: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=1.327 Mb, time delta=3434 sec, took=0.027 sec)
[Wed Jun  5 06:32:40.997 2024] [3060674] rt: table retail_products: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.009 Mb, time delta=430 sec, took=0.010 sec)
[Wed Jun  5 06:32:41.010 2024] [3060674] rt: table school_properties_values: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.285 Mb, time delta=3434 sec, took=0.007 sec)
[Wed Jun  5 06:32:41.015 2024] [3060674] rt: table shops: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.083 Mb, time delta=3433 sec, took=0.003 sec)
[Wed Jun  5 06:32:41.016 2024] [3060674] rt: table tbks: ramchunk saved ok (mode=forced, last TID=0, current TID=0, ram=0.475 Mb, time delta=3434 sec, took=0.001 sec)
[Wed Jun  5 06:32:41.027 2024] [3060670] WARNING: got ident_all
[Wed Jun  5 06:32:52.853 2024] [3060672] WARNING: got ident_all
[Wed Jun  5 06:32:55.124 2024] [3060669] WARNING: last message repeated 3 times
[Wed Jun  5 06:33:06.826 2024] [3060673] WARNING: got ident_all
[Wed Jun  5 06:33:08.138 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 06:33:29.933 2024] [3060671] WARNING: got ident_all
[Wed Jun  5 06:33:31.080 2024] [3060672] WARNING: last message repeated 1 times
[Wed Jun  5 06:33:31.080 2024] [3060672] WARNING: got ident_all
[Wed Jun  5 06:33:32.165 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 06:33:32.504 2024] [3060674] WARNING: got ident_all
[Wed Jun  5 06:33:35.669 2024] [3060669] WARNING: last message repeated 7 times
[Wed Jun  5 06:36:07.926 2024] [3060670] WARNING: got ident_all
[Wed Jun  5 06:36:09.337 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 06:36:46.719 2024] [3060674] WARNING: got ident_all
[Wed Jun  5 06:36:49.382 2024] [3060669] WARNING: last message repeated 5 times
[Wed Jun  5 06:36:51.291 2024] [3060674] WARNING: got ident_all
[Wed Jun  5 06:36:52.386 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 07:08:34.768 2024] [3060673] WARNING: got ident_all
[Wed Jun  5 07:08:35.976 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 07:08:53.386 2024] [3060673] WARNING: got ident_all
[Wed Jun  5 07:08:54.496 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 07:09:17.988 2024] [3060672] WARNING: got ident_all
[Wed Jun  5 07:09:19.022 2024] [3060669] WARNING: last message repeated 1 times
[Wed Jun  5 07:09:47.364 2024] [3060671] WARNING: got ident_all
[Wed Jun  5 07:09:49.056 2024] [3060669] WARNING: last message repeated 10 times
[Wed Jun  5 07:23:23.759 2024] [3060669] caught SIGTERM, shutting down
[Wed Jun  5 07:23:23.760 2024] [3060672] WARNING: [BUDDY] [19010] error: Interrupted due to shutdown
[Wed Jun  5 07:27:28.616 2024] [3060668] watchdog: main process 3060669 killed cleanly with SIGKILL, shutting down
[Wed Jun  5 07:27:37.061 2024] [3074640] watchdog: main process 3074641 forked ok
[Wed Jun  5 07:27:37.070 2024] [3074641] Using local time zone '/etc/localtime'
[Wed Jun  5 07:27:37.071 2024] [3074641] starting daemon version '6.3.0 1811a9efb@24052209 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)' ...
[Wed Jun  5 07:27:37.071 2024] [3074641] listening on 10.9.2.99:9312 for sphinx and http(s)
[Wed Jun  5 07:27:37.071 2024] [3074641] listening on 10.9.2.99:9306 for mysql
[Wed Jun  5 07:27:37.071 2024] [3074641] listening on 10.9.2.99:9308 for sphinx and http(s)
[Wed Jun  5 07:27:37.220 2024] [3074642] WARNING: table 'publisher_brands': table 'publisher_brands': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.236 2024] [3074645] WARNING: table 'publishers': table 'publishers': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.330 2024] [3074644] WARNING: table 'publisher_series': table 'publisher_series': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.331 2024] [3074642] WARNING: table 'school_properties_values': table 'school_properties_values': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.505 2024] [3074643] WARNING: table 'persons': table 'persons': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.519 2024] [3074645] WARNING: table 'product_properties_values': table 'product_properties_values': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.968 2024] [3074643] WARNING: table 'products': table 'products': morphology option changed from config has no effect, ignoring
[Wed Jun  5 07:27:37.975 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1032.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:37.978 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1082.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:37.981 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.904.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:37.985 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.822.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:37.995 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.932.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:38.000 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.928.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:38.005 2024] [3074643] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1021.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Wed Jun  5 07:27:39.977 2024] [3074645] prereading 18 tables
[Wed Jun  5 07:27:39.987 2024] [3074641] accepting connections
[Wed Jun  5 07:27:40.072 2024] [3074643] [BUDDY] started v2.3.10 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://10.9.2.99:9312 --bind=127.0.0.1 --disable-telemetry --threads=5 --skip=manticoresoftware/buddy-plugin-sharding --skip=manticoresoftware/buddy-plugin-queue' at http://127.0.0.1:34621
[Wed Jun  5 07:27:40.087 2024] [3074642] [BUDDY] Loaded plugins:
[Wed Jun  5 07:27:40.087 2024] [3074642] [BUDDY]   core: empty-string, backup, emulate-elastic, create, insert, alias, select, show, cli-table, plugin, test, alter-distributed-table, alter-rename-table, modify-table, knn, replace
[Wed Jun  5 07:27:40.087 2024] [3074642] [BUDDY]   local: 
[Wed Jun  5 07:27:40.087 2024] [3074642] [BUDDY]   extra: 
[Wed Jun  5 07:28:19.136 2024] [3074643] rt: table retail_products: diskchunk 50520(85), segments 1  saved in 2.241569 (2.253052) sec, RAM saved/new 25837171/0 ratio 0.900000 (soft limit 120795955, conf limit 134217728)
[Wed Jun  5 07:30:50.336 2024] [3074645] preread 18 tables in 190.359 sec

Please tell me what is causing this behavior and how to eliminate it?

Ticket on the forum: https://forum.manticoresearch.com/t/manticore-backup-failed-to-send-query-to-the-manticore-search-daemon/1973

tomatolog commented 2 months ago

could you upload your backup of the products index that we could try to restore and investigate the hung?

You could upload your data as explained in the manual

srgtest commented 2 months ago

Uploaded a backup of the product index: issue-116.

srgtest commented 2 months ago

Any information on the problem?

sanikolaev commented 2 months ago

Looks like more tables or additional load are required to reproduce the issue:

snikolaev@dev2:~/issue-116$ manticore-backup --config=manti.conf --backup-dir=backup
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /home/snikolaev/issue-116/manti.conf
Tables to backup: all tables
Backup dir: backup

Manticore config
  endpoint =  http://127.0.0.1:9316

Manticore versions:
  manticore: 6.3.1 00e77610d@24052911 dev
  columnar: 2.3.1 4383a90@24052309
  secondary: 2.3.1 4383a90@24052309
  knn: 2.3.1 4383a90@24052309
  buddy: 0.0.0

WARNING: we couldn't fully preserve permissions of the files you've backed up. Be careful when you restore from the backup or re-run the backup as root
2024-06-14 15:50:07 [Info] Starting the backup...
2024-06-14 15:50:07 [Info] Backing up config files...
2024-06-14 15:50:07 [Info]   config files - OK
2024-06-14 15:50:07 [Info] Backing up global state files...
2024-06-14 15:50:07 [Info]   global state files – OK
2024-06-14 15:50:07 [Info] Backing up tables...
2024-06-14 15:50:07 [Info]   products (rt) [2.46G]...
2024-06-14 15:50:18 [Info]    OK
2024-06-14 15:50:18 [Info] Running sync
2024-06-14 15:50:35 [Info]  OK
2024-06-14 15:50:35 [Info] You can find backup here: /home/snikolaev/issue-116/backup/backup-20240614155007
2024-06-14 15:50:35 [Info] Elapsed time: 27.98s
2024-06-14 15:50:36 [Info] Done
2024-06-14 15:50:36 [Info]
Unfreezing all tables...
2024-06-14 15:50:36 [Info]   products...
2024-06-14 15:50:36 [Info]    OK

snikolaev@dev2:~/issue-116$ manticore-backup --config=manti.conf --backup-dir=backup --compress
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /home/snikolaev/issue-116/manti.conf
Tables to backup: all tables
Backup dir: backup

Manticore config
  endpoint =  http://127.0.0.1:9316

Manticore versions:
  manticore: 6.3.1 00e77610d@24052911 dev
  columnar: 2.3.1 4383a90@24052309
  secondary: 2.3.1 4383a90@24052309
  knn: 2.3.1 4383a90@24052309
  buddy: 0.0.0

WARNING: we couldn't fully preserve permissions of the files you've backed up. Be careful when you restore from the backup or re-run the backup as root
2024-06-14 15:57:28 [Info] Starting the backup...
2024-06-14 15:57:28 [Info] Backing up config files...
2024-06-14 15:57:28 [Info]   config files - OK
2024-06-14 15:57:28 [Info] Backing up global state files...
2024-06-14 15:57:28 [Info]   global state files – OK
2024-06-14 15:57:28 [Info] Backing up tables...
2024-06-14 15:57:28 [Info]   products (rt) [2.46G]...
2024-06-14 15:57:45 [Info]    OK
2024-06-14 15:57:45 [Info] Running sync
2024-06-14 15:57:51 [Info]  OK
2024-06-14 15:57:51 [Info] You can find backup here: /home/snikolaev/issue-116/backup/backup-20240614155728
2024-06-14 15:57:51 [Info] Elapsed time: 22.55s
2024-06-14 15:57:51 [Info] Done
2024-06-14 15:57:51 [Info]
Unfreezing all tables...
2024-06-14 15:57:51 [Info]   products...
2024-06-14 15:57:51 [Info]    OK
srgtest commented 2 months ago

Yes, without load there are no problems. But when you remove a backup to the product index, there are SELECT and UPDATE. There is an idea that this is due to load and requests. At this moment, the backup is FREEZE before the backup is completed, and requests are still being updated. And apparently he can no longer apply and defrost everything correctly.

How can I check this?

sanikolaev commented 2 months ago

Pls let me know what update and select queries are running during backup. Simple:

snikolaev@dev2:~$ while true; do mysql -P9315 -h0 -e "update products set nds=21 where id=612731;update products set nds=20 where id=612731"; echo -n .; done;

doesn't change anything. Best if you can find something simple which can be run in parallel with manticore-backup which then makes it fail.

srgtest commented 2 months ago

There is a query.log log for this date. In the attachment, there are queries that were executed during the backup period. query.log.2.gz Can be compared by time.

tomatolog commented 2 months ago

from query.log you provided I see the last queries were

/* Wed Jun 05 06:53:18.396 2024 conn 14060 (10.9.2.195:51636) real 0.522 wall 0.523 found 1190452 */ SELECT id,

then there is a gap and next queries with errors seems come after daemon restart

...
/* Wed Jun 05 07:13:10.908 2024 conn 19010 (10.180.70.46:57052) */ /* ApplicationName=DBeaver 24.0.5 - Metadata */ SHOW FULL TABLES FROM

I can not line up query.log with time of the backup

2024-06-05 07:09:47 [Info]    OK
2024-06-05 07:09:47 [Info]   products...

I suggest to set in your config

searchd
{
query_log_commands = 1

that will log the start command then also make an entry then command finished. Then reproduce the case at your side and provide searchd.log query.log along with backup log these we could line up and try to reproduce the case here localy

srgtest commented 2 months ago

When the daemon froze, the requests were no longer received and were not recorded in the logs. I can also attach log files, you can see the time there.

sanikolaev commented 2 months ago

@srgtest what may work out is if you just give us a few typical queries you make and some info about the load like:

srgtest commented 1 month ago

@sanikolaev Ok, I'll try to collect such information and requests.

nxjts commented 1 month ago

I have the same problem. A backup on a loaded database fails with an error and work is blocked.

root@ec2733c78e2d:/tmp# manticore-backup --backup-dir=/tmp
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)

Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: all tables
Backup dir: /tmp

Manticore config
  endpoint =  http://127.0.0.1:9308

Manticore versions:
  manticore: 6.3.3 8c855239d@24070613 dev
  columnar: 2.3.1 42f2b06@24070110
  secondary: 2.3.1 42f2b06@24070110
  knn: 2.3.1 42f2b06@24070110
  buddy: 0.0.0
2024-07-08 17:46:19 [Info] Starting the backup...
2024-07-08 17:46:19 [Info] Backing up config files...
2024-07-08 17:46:19 [Info]   config files - OK
2024-07-08 17:46:19 [Info] Backing up global state files...
2024-07-08 17:46:19 [Info]   global state files – OK
2024-07-08 17:46:19 [Info] Backing up tables...
2024-07-08 17:46:19 [Info]   tbl01 (rt) [68.024G]...
2024-07-08 17:51:54 [Info]    OK
2024-07-08 17:52:54 [Error] Failed to send query to the Manticore Search daemon. Ensure that it is set up to listen for HTTP or HTTPS connections and has the appropriate certificates in place. Additionally, check the 'max_connections' setting in the configuration file to ensure that it has not been exceeded.

In searchd.log:

[Mon Jul  8 17:46:19.560 2024] [80] DEBUG: Detached::AddThread called for 'alone_0', tid 80
[Mon Jul  8 17:46:19.561 2024] [80] DEBUG: Detached::RemoveThread called for 80
[Mon Jul  8 17:46:19.561 2024] [80] DEBUG: Terminated thread 80, 'alone_0'
[Mon Jul  8 17:46:19.561 2024] [81] DEBUG: Detached::AddThread called for 'alone_1', tid 81
[Mon Jul  8 17:46:19.562 2024] [81] DEBUG: Detached::RemoveThread called for 81
[Mon Jul  8 17:46:19.562 2024] [81] DEBUG: Terminated thread 81, 'alone_1'
[Mon Jul  8 17:46:19.564 2024] [82] DEBUG: Detached::AddThread called for 'alone_2', tid 82
[Mon Jul  8 17:46:19.565 2024] [82] DEBUG: Detached::RemoveThread called for 82
[Mon Jul  8 17:46:19.565 2024] [82] DEBUG: Terminated thread 82, 'alone_2'
[Mon Jul  8 17:46:19.565 2024] [83] DEBUG: Detached::AddThread called for 'alone_3', tid 83
[Mon Jul  8 17:46:19.565 2024] [83] DEBUG: Detached::RemoveThread called for 83
[Mon Jul  8 17:46:19.565 2024] [83] DEBUG: Terminated thread 83, 'alone_3'
[Mon Jul  8 17:46:19.566 2024] [84] DEBUG: Detached::AddThread called for 'alone_4', tid 84
[Mon Jul  8 17:46:19.566 2024] [84] DEBUG: P01: syntax error, unexpected FREEZE near 'FREEZE tbl01, tbl02'
[Mon Jul  8 17:46:19.566 2024] [84] DEBUG: P03: syntax error, unexpected tablename near 'FREEZE tbl01, tbl02'
[Mon Jul  8 17:46:19.566 2024] [84] WARNING: got ident_all
[Mon Jul  8 17:46:19.667 2024] [15] last message repeated 1 times
[Mon Jul  8 17:46:19.667 2024] [15] rt: table tbl01: ramchunk saved ok (mode=forced, last TID=1657961461, current TID=1658000187, ram=37.783 Mb, time delta=53 sec, took=0.100 sec)
[Mon Jul  8 17:46:19.781 2024] [52] rt: table tbl02: ramchunk saved ok (mode=forced, last TID=94749496, current TID=94749758, ram=14.861 Mb, time delta=54 sec, took=0.040 sec)
[Mon Jul  8 17:46:19.808 2024] [84] DEBUG: Detached::RemoveThread called for 84
[Mon Jul  8 17:46:19.808 2024] [84] DEBUG: Terminated thread 84, 'alone_4'
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: Detached::AddThread called for 'alone_5', tid 85
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: P01: syntax error, unexpected identifier near 'FLUSH ATTRIBUTES'
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: P03: syntax error, unexpected tablename near 'FLUSH ATTRIBUTES'
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: attrflush: forcing check, tag=0
[Mon Jul  8 17:46:19.835 2024] [47] DEBUG: attrflush: doing the check
[Mon Jul  8 17:46:19.835 2024] [47] DEBUG: attrflush: no dirty tables found
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: attrflush: check finished, tag=0
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: Detached::RemoveThread called for 85
[Mon Jul  8 17:46:19.835 2024] [85] DEBUG: Terminated thread 85, 'alone_5'
[Mon Jul  8 17:46:19.835 2024] [86] DEBUG: Detached::AddThread called for 'alone_6', tid 86
[Mon Jul  8 17:46:19.836 2024] [86] DEBUG: P01: syntax error, unexpected FREEZE near 'FREEZE tbl01'
[Mon Jul  8 17:46:19.836 2024] [86] DEBUG: P03: syntax error, unexpected tablename near 'FREEZE tbl01'
[Mon Jul  8 17:46:19.836 2024] [86] WARNING: got ident_all
[Mon Jul  8 17:46:19.855 2024] [86] DEBUG: Detached::RemoveThread called for 86
[Mon Jul  8 17:46:19.855 2024] [86] DEBUG: Terminated thread 86, 'alone_6'
[Mon Jul  8 17:50:45.858 2024] [36] DEBUG: P01: syntax error, unexpected KILL near 'KILL QUERY 229'
[Mon Jul  8 17:50:45.860 2024] [36] DEBUG: P03: syntax error, unexpected tablename near 'KILL QUERY 229'
[Mon Jul  8 17:50:47.034 2024] [11] DEBUG: P01: syntax error, unexpected KILL near 'KILL 229'
[Mon Jul  8 17:50:47.034 2024] [11] DEBUG: P03: syntax error, unexpected tablename near 'KILL 229'
[Mon Jul  8 17:51:54.140 2024] [87] DEBUG: Detached::AddThread called for 'alone_7', tid 87
[Mon Jul  8 17:51:54.140 2024] [87] DEBUG: P01: syntax error, unexpected UNFREEZE near 'UNFREEZE tbl01'
[Mon Jul  8 17:51:54.140 2024] [87] DEBUG: P03: syntax error, unexpected tablename near 'UNFREEZE tbl01'
[Mon Jul  8 17:51:54.140 2024] [87] WARNING: got ident_all
[Mon Jul  8 17:51:58.598 2024] [31] DEBUG: P01: syntax error, unexpected KILL near 'KILL QUERY 234'
[Mon Jul  8 17:51:58.598 2024] [31] DEBUG: P03: syntax error, unexpected tablename near 'KILL QUERY 234'
[Mon Jul  8 17:51:58.923 2024] [54] DEBUG: P01: syntax error, unexpected KILL near 'KILL 234'
[Mon Jul  8 17:51:58.923 2024] [54] DEBUG: P03: syntax error, unexpected tablename near 'KILL 234'
[Mon Jul  8 17:52:07.052 2024] [88] DEBUG: Detached::AddThread called for 'alone_8', tid 88
[Mon Jul  8 17:52:08.066 2024] [88] DEBUG: Detached::RemoveThread called for 88
[Mon Jul  8 17:52:08.066 2024] [88] DEBUG: Terminated thread 88, 'alone_8'

This leaves one process in this state:

strace: Process 221026 attached
pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=154724772}, NULL) = 0 (Timeout)
getpid()                                = 1
gettid()                                = 1
pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=500000000}, NULL) = 0 (Timeout)
...

Average load ~ 2000 requests. Usually there are more updates, but there are also selections. I’ll also note that on Ubuntu, under heavy load, the service crashed; it could only work with max_threads_per_query = 1. We switched to Debian.

sanikolaev commented 1 month ago

Average load ~ 2000 requests. Usually there are more updates, but there are also selections.

2000 requests per second, mostly updates?

sanikolaev commented 1 month ago

I've reproduced smth similar when backing up a few gigabytes table under heavy UPDATE load:

2024-07-09 04:48:02 [Info]   name...

Fatal error: Uncaught Manticoresearch\Backup\Exception\SearchdException: Failed to send query to the Manticore Search daemon. Ensure that it is set up to listen for HTTP or HTTPS connections and has the appropriate certificates in place. Additionally, check the 'max_connections' setting in the configuration file to ensure that it has not been exceeded.  in /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php:279
Stack trace:
#0 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(145): Manticoresearch\Backup\Lib\ManticoreClient->execute('UNFREEZE name')
#1 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(160): Manticoresearch\Backup\Lib\ManticoreClient->unfreeze(Array)
#2 [internal function]: Manticoresearch\Backup\Lib\ManticoreClient->Manticoresearch\Backup\Lib\{closure}(true, 'name')
#3 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php(157): array_reduce(Array, Object(Closure), true)
#4 /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreBackup.php(166): Manticoresearch\Backup\Lib\ManticoreClient->unfreezeAll()
#5 [internal function]: Manticoresearch\Backup\Lib\ManticoreBackup::Manticoresearch\Backup\Lib\{closure}(Object(Manticoresearch\Backup\Lib\ManticoreClient))
#6 {main}
  thrown in /usr/share/manticore/modules/manticore-backup/src/Lib/ManticoreClient.php on line 279
mysql> unfreeze name;

is hanging.

Trying to localize.

sanikolaev commented 1 month ago

@srgtest @nxjts

Can you confirm you were updating a string/json attribute when it hanged?

Here's what I've reproduced:

MRE

Config:

snikolaev@dev2:~$ cat 116.conf
searchd {
    listen = 9315:mysql
    listen = 9316
    log = searchd.log
    pid_file = searchd.pid
    data_dir = data
    watchdog = 0
}

Start searchd:

snikolaev@dev2:~$ searchd -c 116.conf
Manticore 6.3.3 f92db4ac1@24070715 dev (columnar 2.3.1 42f2b06@24070110) (secondary 2.3.1 42f2b06@24070110) (knn 2.3.1 42f2b06@24070110)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

[24:48.472] [220367] using config file '/home/snikolaev/116.conf' (140 chars)...
snikolaev@dev2:~$ starting daemon version '6.3.3 f92db4ac1@24070715 dev (columnar 2.3.1 42f2b06@24070110) (secondary 2.3.1 42f2b06@24070110) (knn 2.3.1 42f2b06@24070110)' ...
listening on all interfaces for mysql, port=9315
listening on all interfaces for sphinx and http(s), port=9316

Create a table:

mysql -P9315 -h0 -e "drop table if exists t; create table t(s string); insert into t values(1, 'a'); flush ramchunk t"

Start UPDATEs:

rm -f /tmp/update.sql; for n in `seq 1 10000`; do echo "update t set s='b' where id=1;" >> /tmp/update.sql; done;
while true; do mysql -P9315 -h0 < /tmp/update.sql | break; echo -n .; done;

In another window:

Freeze the table:

mysql -P9315 -h0 -e "freeze t";
+-------------------+-----------------------------------+
| file              | normalized                        |
+-------------------+-----------------------------------+
| data/t/t.0.spa    | /home/snikolaev/data/t/t.0.spa    |
| data/t/t.0.spb    | /home/snikolaev/data/t/t.0.spb    |
| data/t/t.0.spd    | /home/snikolaev/data/t/t.0.spd    |
| data/t/t.0.spe    | /home/snikolaev/data/t/t.0.spe    |
| data/t/t.0.sph    | /home/snikolaev/data/t/t.0.sph    |
| data/t/t.0.sphi   | /home/snikolaev/data/t/t.0.sphi   |
| data/t/t.0.spi    | /home/snikolaev/data/t/t.0.spi    |
| data/t/t.0.spidx  | /home/snikolaev/data/t/t.0.spidx  |
| data/t/t.0.spm    | /home/snikolaev/data/t/t.0.spm    |
| data/t/t.0.spp    | /home/snikolaev/data/t/t.0.spp    |
| data/t/t.0.spt    | /home/snikolaev/data/t/t.0.spt    |
| data/t/t.meta     | /home/snikolaev/data/t/t.meta     |
| data/t/t.ram      | /home/snikolaev/data/t/t.ram      |
| data/t/t.settings | /home/snikolaev/data/t/t.settings |
+-------------------+-----------------------------------+

Attempt unfreezing it:

mysql -P9315 -h0 -e "unfreeze t";
# HANGS here

Notes

nxjts commented 1 month ago

Most often uint, bigint, mva. Less often json.