manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.9k stars 493 forks source link

Manticore crashes on query execution and other crashed during cluster recovery. #919

Closed pavelnemirovsky closed 1 year ago

pavelnemirovsky commented 1 year ago

Describe the bug
I'll apologize upfront since this bug description will be complex with multiple dependencies, but I need your help how to formulate it properly. I'll describe the symptoms we experienced.

To Reproduce
Steps to reproduce the behavior:

  1. I executed the following query and its caused a crash on 3 different nodes (under docker manticore the issue is impossible to reproduce I am trying to find the recipe to reproduce the problem)
    
    CREATE TABLE fgi_dev (    
    publish_date timestamp,    
    internal_id string attribute,    
    tags_id json,    
    tags_name json,    
    entities_id json,    
    article_body_hash text stored,    
    content text indexed    
    ) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars=' ,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/data/manticore/fgi_dev/en' rt_mem_limit='2147483648'    

mysql> desc fgi_dev;


Field Type Properties
id bigint columnar fast_fetch
article_body_hash text stored
content text indexed
publish_date timestamp columnar fast_fetch
internal_id string columnar fast_fetch
tags_id json
tags_name json
entities_id json

SELECT 0 as shard, weight(), internal_id FROM fgi_dev
WHERE
internal_id IN ('b007dab0ba22c65824aa810aba5ed146523d78c9','5e64698da1674743d74b1b001dccdd430ba88e9d');

2.  After this issue occurred I decided to shutdown all nodes and pick the node from which one I want to bootstrap (we have a testing cluster of 3 nodes)    
4. I followed the regular recovery process, but I didn't disable incoming production traffic that writes documents into Manticore cluster via LB using round-robin concept, and I started to observe random crashes that didn't occur ever before and it only happened on nodes that were recovering from Donor node. I'll upload crashdump into your FTP server, but I assume you'll see inside the root cause, but it will be a side effect symptom only.    

**Full Crash Dump**    

--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[]","entities_id":"[205904676,202583093]","internal_id":"1cdc2f0fcfca355a5903d8032d1a77b0","article_body_hash":"N/A","tags_id":"[]","publish_date":1666307223,"content":"NORAD F-16 fighter jet intercepted small plane in restricted airspace near... https://t.co/h1zbt3IqqF"},"index":"mc_5e5d7796_workflow_production_1996","id":5829685590507322729}}
--- request dump end ---
--- local index:���f����f����f
Manticore 5.0.2 348514c86@220530 dev
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 13.0.1
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bionic -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore
Host OS is Linux x86_64
Stack bottom = 0x7f66e4023cb0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x7f66408dd0e2)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x7f66408dd0e2, stack=0x7f66e4030000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd[0x5984d1]
/usr/bin/searchd[0x47ac4f]
/lib/x86_64-linux-gnu/libpthread.so.0( 0x12980)[0x7f86e2716980]
/usr/bin/searchd[0x4aa1a5]
/usr/bin/searchd[0x4271b6]
/usr/bin/searchd[0x42655c]
/usr/bin/searchd[0x41c339]
/usr/bin/searchd[0x41c672]
/usr/bin/searchd[0x469b00]
/usr/bin/searchd[0x467498]
/usr/bin/searchd[0x467f85]
/usr/bin/searchd[0xdb1f2c]
/usr/bin/searchd[0xdce92f]
Trying boost backtrace:
0# 0x0000000000598524 in /usr/bin/searchd
1# 0x000000000047AC4F in /usr/bin/searchd
2# 0x00007F86E2716980 in /lib/x86_64-linux-gnu/libpthread.so.0
3# 0x00000000004AA1A5 in /usr/bin/searchd
4# 0x00000000004271B6 in /usr/bin/searchd
5# 0x000000000042655C in /usr/bin/searchd
6# 0x000000000041C339 in /usr/bin/searchd
7# 0x000000000041C672 in /usr/bin/searchd
8# 0x0000000000469B00 in /usr/bin/searchd
9# 0x0000000000467498 in /usr/bin/searchd
10# 0x0000000000467F85 in /usr/bin/searchd
11# 0x0000000000DB1F2C in /usr/bin/searchd
12# 0x0000000000DCE92F in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
[Thu Oct 20 23:25:07.718 2022] [3167] WARNING: Member 0.0 (daemon_11715_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
[Thu Oct 20 23:25:07.827 2022] [3167] WARNING: Member 1.0 (daemon_3144_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
[Thu Oct 20 23:25:08.720 2022] [3167] WARNING: Member 0.0 (daemon_11715_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
[Thu Oct 20 23:25:08.828 2022] [3167] WARNING: Member 1.0 (daemon_3144_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
[Thu Oct 20 23:25:09.721 2022] [3167] WARNING: Member 0.0 (daemon_11715_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
[Thu Oct 20 23:25:09.830 2022] [3167] WARNING: Member 1.0 (daemon_3144_DMETRICS_FTS_1) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily unavailable
--- active threads ---
thd 0 (work_0), proto http, state net_read, command -
thd 1 (work_7), proto sphinx, state query, command clusterpq
--- Totally 4 threads, and 2 client-working threads ---
------- CRASH DUMP END -------
[Thu Oct 20 23:25:10.301 2022] [2354] watchdog: main process 3141 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:25:10.302 2022] [2354] watchdog: main process 3655 forked ok
[Thu Oct 20 23:25:10.303 2022] [3655] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
[Thu Oct 20 23:25:10.303 2022] [3655] listening on 127.0.0.1:9306 for mysql
[Thu Oct 20 23:25:10.303 2022] [3655] listening on 127.0.0.1:9308 for sphinx and http(s)
[Thu Oct 20 23:25:10.303 2022] [3655] listening on 10.0.82.16:9312 for sphinx and http(s)
[Thu Oct 20 23:25:10.303 2022] [3655] listening on 10.0.82.16:9306 for mysql
[Thu Oct 20 23:25:10.303 2022] [3655] listening on 10.0.82.16:9308 for sphinx and http(s)
[Thu Oct 20 23:25:11.260 2022] [3656] WARNING: index 'fgi_dev': index 'fgi_dev': morphology option changed from config has no effect, ignoring


**Frequency of crash events that occurred:**    

--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"1\"]","entities_id":"[19824405,19854814]","internal_id":"c8472ed73f086b19afafcec542dae3dc","article_body_hash":"N/A","tags_id":"[4653]","publish_date":1666263060,"content":"@mikepompeo So how is a global energy crisis due to a war we had nothing to do with, the cause of this admin. Maybe complain about Putin and Russia. You all just love Putin so much it is insanity."},"index":"mc_c9d1fb0a_workflow_production_2075","id":8614399204832304449}}
--- request dump end ---
--
[Thu Oct 20 21:46:53.149 2022] [32587] watchdog: main process 32588 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 21:46:53.149 2022] [32587] watchdog: main process 765 forked ok
[Thu Oct 20 21:46:53.151 2022] [765] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"qwerty\"]","entities_id":"[19850138,208683655]","internal_id":"432a9e5ed124c2081942516c2a5012ba","article_body_hash":"N/A","tags_id":"[13885]","publish_date":1666220292,"content":"🚨🇺🇦Ukraine has shot down 223 Iranian-made drones since mid-September: Officials https://t.co/adapPImCH1"},"index":"mc_7a59a26c_workflow_dev_6025","id":7694347271799426477}}
--- request dump end ---
--
[Thu Oct 20 22:14:28.746 2022] [32587] watchdog: main process 765 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 22:14:28.747 2022] [32587] watchdog: main process 1573 forked ok
[Thu Oct 20 22:14:28.749 2022] [1573] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"asdfgh\"]","entities_id":"[19773300,19854834,19850138]","internal_id":"938181ef49c0167d8dfd97d3ede02272","article_body_hash":"N/A","tags_id":"[13884]","publish_date":1666220285,"content":"@duty2warn We spent 12 trillion on the Bush-Cheney nonsense in Iraq and that mideast BS you got a problem with 13 billion going to Ukraine? Lame AF! #BlueWaveComing"},"index":"mc_7a59a26c_workflow_dev_6025","id":1803530753829142200}}
--- request dump end ---
--
[Thu Oct 20 23:06:24.034 2022] [2354] watchdog: main process 2355 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:06:24.034 2022] [2354] watchdog: main process 3141 forked ok
[Thu Oct 20 23:06:24.035 2022] [3141] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[]","entities_id":"[205904676,202583093]","internal_id":"1cdc2f0fcfca355a5903d8032d1a77b0","article_body_hash":"N/A","tags_id":"[]","publish_date":1666307223,"content":"NORAD F-16 fighter jet intercepted small plane in restricted airspace near... https://t.co/h1zbt3IqqF"},"index":"mc_5e5d7796_workflow_production_1996","id":5829685590507322729}}
--- request dump end ---
--
[Thu Oct 20 23:25:10.301 2022] [2354] watchdog: main process 3141 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:25:10.302 2022] [2354] watchdog: main process 3655 forked ok
[Thu Oct 20 23:25:10.303 2022] [3655] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"1\"]","entities_id":"[]","internal_id":"e144c55ba682b9b1c3fa2dcb5b373ed1","article_body_hash":"N/A","tags_id":"[4653]","publish_date":1666263054,"content":"@DrTedros said, the six million people of Tigray had been \"kept under siege for almost two years\" and he added, \n\"Banking, food, electricity and healthcare are being used as weapons of war,\"\n#EndTigrayGenocide\n#EritreaOutOfTigray\n@JoeBiden @UNGeneva https://t.co/MtsuMZ2BbV"},"index":"mc_c9d1fb0a_workflow_production_2075","id":3583440816303812531}}
--- request dump end ---
--
[Thu Oct 20 23:35:15.129 2022] [2354] watchdog: main process 3655 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:35:15.130 2022] [2354] watchdog: main process 3945 forked ok
[Thu Oct 20 23:35:15.131 2022] [3945] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"1\"]","entities_id":"[]","internal_id":"6b04f54e0a93af19eacb8398695ed1f6","article_body_hash":"N/A","tags_id":"[4653]","publish_date":1666263054,"content":"@Mikkeltron It's the future war story when never got"},"index":"mc_c9d1fb0a_workflow_production_2075","id":3725607958522334040}}
--- request dump end ---
--
[Thu Oct 20 23:39:22.292 2022] [2354] watchdog: main process 3945 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:39:22.292 2022] [2354] watchdog: main process 4072 forked ok
[Thu Oct 20 23:39:22.293 2022] [4072] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"qwer\"]","entities_id":"[200643557]","internal_id":"73181e5ed4366f92debd13d4fc9b60de","article_body_hash":"N/A","tags_id":"[4922]","publish_date":1666220403,"content":"A Golden Age,\nCold War Steve,the best political artist,historian,\n who tells it as it happens,so that future generations will be able to see ,the insane Tory gov that have reigned for over a decade,for exactly what they are,and the damage they have caused this pleasant land, https://t.co/0slDQBLEqn"},"index":"mc_7a59a26c_workflow_production_2106","id":6130441337026784002}}
--- request dump end ---
--
[Thu Oct 20 23:43:01.195 2022] [2354] watchdog: main process 4072 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:43:01.195 2022] [2354] watchdog: main process 4184 forked ok
[Thu Oct 20 23:43:01.196 2022] [4184] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)' ...
--
--- crashed HTTP request dump ---
{"insert":{"cluster":"DMETRICS_FTS_1","doc":{"tags_name":"[\"asdfgh\"]","entities_id":"[19850138]","internal_id":"92c98f3a7f0abd6525b377a4a77a2d3a","article_body_hash":"N/A","tags_id":"[13884]","publish_date":1666220277,"content":"Markets take their cue from hawkish central banks rather than fears over Ukraine fallout"},"index":"mc_7a59a26c_workflow_dev_6025","id":1171928518493367078}}
--- request dump end ---
--
[Thu Oct 20 23:55:00.751 2022] [2354] watchdog: main process 4184 crashed via CRASH_EXIT (exit code 2), will be restarted
[Thu Oct 20 23:55:00.751 2022] [2354] watchdog: main process 6499 forked ok
[Thu Oct 20 23:55:00.752 2022] [6499] starting daemon version '5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)'


**Expected behavior**    
I am expecting that Manticore service won't crash while executing following query:    

SELECT 0 as shard, weight(), internal_id FROM fgi_dev
WHERE
internal_id IN ('b007dab0ba22c65824aa810aba5ed146523d78c9','5e64698da1674743d74b1b001dccdd430ba88e9d');


**Describe the environment:**    
 - Manticore Search version (top line in output of `bin/searchd -v` or `bin/indexer -v`): e.g. ` 3.5.0 1d34c49@200722 release`    

Manticore 5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)

 - OS version (`uname -a` if on a Unix-like system):     

Linux manticore-01.dmetrics.internal 5.4.0-1083-aws #90~18.04.1-Ubuntu SMP Fri Aug 5 08:12:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux



**Messages from log files:**    
None, under query.log the query that caused the original crash wasn't written since the whole process became frozen for a long period of time.    

**Additional context**    
Add any other context about the problem here.    
In case you've faced a crash what `indextool --check` returns.    
pavelnemirovsky commented 1 year ago

@sanikolaev @tomatolog guys I am trying to upload the crash dump to your FTP server, but I am unable to, regardless I am using FTP passive/active modes ... is everything is ok with your FTP server?

sanikolaev commented 1 year ago

is everything is ok with your FTP server?

Yes, looks ok to me:

➜  ~ lftp -e "set ftp:passive-mode off; mkdir github-issue-1234; mirror -LR ftp/ github-issue-1234/" -u manticorebugs,shithappens dev.manticoresearch.com
mkdir ok, `github-issue-1234' created
New: 1 file, 0 symlinks
lftp manticorebugs@dev.manticoresearch.com:/> quit
pavelnemirovsky commented 1 year ago

I tried from US/Israel 3 different servers / computers @sanikolaev

root@manticore-01:/var/crash# lftp -e "set ftp:passive-mode off; mkdir github-issue-919; cd github-issue-919;" -u manticorebugs,shithappens dev.manticoresearch.com
mkdir: Access failed: 550 github-issue-919: File exists
cd ok, cwd=/github-issue-919
lftp manticorebugs@dev.manticoresearch.com:/github-issue-919> put _usr_bin_searchd.112.crash.gz
`_usr_bin_searchd.112.crash.gz' at 196608 (0%) [Waiting for data connection...]

root@manticore-01:/var/crash# lftp -e "set ftp:passive-mode on; mkdir github-issue-919; cd github-issue-919;" -u manticorebugs,shithappens dev.manticoresearch.com
mkdir: Access failed: 550 github-issue-919: File exists
cd ok, cwd=/github-issue-919
lftp manticorebugs@dev.manticoresearch.com:/github-issue-919> put _usr_bin_searchd.112.crash.gz
`_usr_bin_searchd.112.crash.gz' at 196608 (0%) [Making data connection...]
pavelnemirovsky commented 1 year ago
image
[40:09.924] [4374] DEBUG: LoopClientMySQL command 3, 'select * from fgi_stage where internal_id='fdsfdsf''
[40:09.924] [4374] DEBUG: Tick coro search
[40:09.924] [4374] DEBUG: Started: 0
[40:09.924] [4374] DEBUG: Started: 1
------- FATAL: CRASH DUMP -------
[Fri Oct 21 15:40:09.783 2022] [ 4367]

--- crashed SphinxQL request dump ---
select * from fgi_stage where internal_id='fdsfdsf'
--- request dump end ---
--- local index:fgi_stage
Manticore 5.0.2 348514c86@220530 dev
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 13.0.1
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bionic -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.20 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore
Host OS is Linux x86_64
Stack bottom = 0x7f6ab684d760, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x7f8a50002a00)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x7f8a50002a00, stack=0x7f6ab6850000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd[0x5984d1]
/usr/bin/searchd[0x47ac4f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f8a73817980]
Trying boost backtrace:
 0# 0x0000000000598524 in /usr/bin/searchd
 1# 0x000000000047AC4F in /usr/bin/searchd
 2# 0x00007F8A73817980 in /lib/x86_64-linux-gnu/libpthread.so.0

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB is not available
--- BT to source lines (depth 3): ---
sanikolaev commented 1 year ago

I tried from US/Israel 3 different servers / computers @sanikolaev

I've removed the dir, there was an empty file, pls try again.

pavelnemirovsky commented 1 year ago

@sanikolaev will try.

I tried to check MC 5.0.3 (https://repo.manticoresearch.com/repository/manticoresearch_bionic_dev/dists/manticore_5.0.3-221020-cd2335eec_amd64.tgz) but it seems it doesn't work with "manticore-columnar-lib/bionic,now 1.15.4-220522-2fef34e amd64 [installed]"

root@ip-10-0-82-53:/var/lib/data/manticore# su manticore -s /bin/bash -c "/usr/bin/searchd --console"
Manticore 5.0.3 cd2335eec@221020 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)

[07:10.949] [5768] WARNING: Error initializing columnar storage: daemon requires columnar library v16 (trying to load v15)
[07:10.949] [5768] WARNING: Error initializing secondary index: daemon requires secondary library v5 (trying to load v1)
[07:10.949] [5768] using config file '/etc/manticoresearch/manticore.conf' (1527 chars)...
sanikolaev commented 1 year ago

it doesn't work with "manticore-columnar-lib/bionic,now 1.15.4-220522-2fef34e amd64 [installed]"

That's right. 5.0.3 is a dev version, it requires a dev version of manticore-columnar-lib, e.g. https://repo.manticoresearch.com/repository/manticoresearch_bionic_dev/dists/bionic/main/binary-amd64/manticore-columnar-lib_1.16.1-221004-a372193_amd64.deb

The dev version's changelog can be found here https://manual.manticoresearch.com/dev/Changelog under "Next release"

pavelnemirovsky commented 1 year ago

This is the best I could produce so far from the crashed process:

(gdb) bt full
#0  0x00007f7b17982d1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffff9c74438) at ../sysdeps/unix/sysv/linux/select.c:41
        resultvar = 18446744073709551102
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000004f51ef in ?? ()
No symbol table info available.
#2  0x00000000004cbce7 in ?? ()
No symbol table info available.
#3  0x00000000004d1e41 in ?? ()
No symbol table info available.
#4  0x00000000004d2d36 in ?? ()
No symbol table info available.
#5  0x00007f7b1788dc87 in __libc_start_main (main=0x4d2cf0, argc=3, argv=0x7ffff9c75038, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffff9c75028) at ../csu/libc-start.c:310
        self = <optimized out>
        __self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -7775646443038472399, 4237616, 140737383977008, 0, 0, 7775650534184814385, 7701564834601994033}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7f7b195488d3 <_dl_init+259>, 0x7f7b1952b6b8}, data = {prev = 0x0, cleanup = 0x0, canceltype = 424970451}}}
        not_first_call = <optimized out>
#6  0x000000000040a95a in ?? ()
No symbol table info available.
(gdb) info locals
self = <optimized out>
__self = <optimized out>
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -7775646443038472399, 4237616, 140737383977008, 0, 0, 7775650534184814385, 7701564834601994033}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7f7b195488d3 <_dl_init+259>, 0x7f7b1952b6b8}, data = {prev = 0x0, cleanup = 0x0, canceltype = 424970451}}}
not_first_call = <optimized out>
root@ip-10-0-82-53:/var/lib/data/manticore# gdb /usr/bin/searchd  6231
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/searchd...(no debugging symbols found)...done.
Attaching to program: /usr/bin/searchd, process 6231
[New LWP 6232]
[New LWP 6233]
[New LWP 6234]
[New LWP 6235]
[New LWP 6236]
[New LWP 6237]
[New LWP 6238]
[New LWP 6239]
[New LWP 6240]
[New LWP 6241]
[New LWP 6242]
[New LWP 6243]
[New LWP 6244]
[New LWP 6245]
[New LWP 6246]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f7b17982d1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffff9c74438) at ../sysdeps/unix/sysv/linux/select.c:41
41  ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) set pagination off
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f7b19751800 (LWP 6231) "searchd" 0x00007f7b17982d1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffff9c74438) at ../sysdeps/unix/sysv/linux/select.c:41
  2    Thread 0x7f7b19750700 (LWP 6232) "work_0" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f7b1972f700 (LWP 6233) "work_1" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  4    Thread 0x7f7b1970e700 (LWP 6234) "work_2" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  5    Thread 0x7f7b196ed700 (LWP 6235) "work_3" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  6    Thread 0x7f7b196cc700 (LWP 6236) "work_4" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  7    Thread 0x7f7b196ab700 (LWP 6237) "work_5" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  8    Thread 0x7f7b1968a700 (LWP 6238) "work_6" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  9    Thread 0x7f7b19669700 (LWP 6239) "work_7" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  10   Thread 0x7f7b19648700 (LWP 6240) "TaskSched" 0x00007f7b17c6b065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f7b19647d98, expected=0, futex_word=0x1309c54) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  11   Thread 0x7f7b19627700 (LWP 6241) "TaskW_1" 0x00007f7b17c6b065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f7b19626d68, expected=0, futex_word=0x1309d10) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  12   Thread 0x7f7b19606700 (LWP 6242) "TickPool_0" 0x00007f7b1798d947 in epoll_wait (epfd=17, events=0x25cdee0, maxevents=6, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  13   Thread 0x7f7b15873700 (LWP 6243) "work_3" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b00037c04) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  14   Thread 0x7f7ae77fe700 (LWP 6244) "work_3" 0x00007f7b1798d947 in epoll_wait (epfd=26, events=0x7f7ae77fd430, maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  15   Thread 0x7f7ae6ffd700 (LWP 6245) "work_3" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b001b21b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  16   Thread 0x7f7b195e5700 (LWP 6246) "DMETRICS_FTS_1_" 0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b00087db4) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
(gdb) thread apply all bt

Thread 16 (Thread 0x7f7b195e5700 (LWP 6246)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b00087db4) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b00087d60, cond=0x7f7b00087d88) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f7b00087d88, mutex=0x7f7b00087d60) at pthread_cond_wait.c:655
#3  0x00007f7b16c8d0fe in gu_fifo_get_head () from /usr/share/manticore/modules/libgalera_manticore.so
#4  0x00007f7b16b75ae6 in gcs_recv(gcs_conn*, gcs_action*) () from /usr/share/manticore/modules/libgalera_manticore.so
#5  0x00007f7b16b4c74d in galera::GcsActionSource::process(void*, bool&) () from /usr/share/manticore/modules/libgalera_manticore.so
#6  0x00007f7b16af62f5 in galera::ReplicatorSMM::async_recv(void*) () from /usr/share/manticore/modules/libgalera_manticore.so
#7  0x00007f7b16ae186c in galera_recv () from /usr/share/manticore/modules/libgalera_manticore.so
#8  0x0000000000dc6ef6 in ?? ()
#9  0x0000000000db1f2c in ?? ()
#10 0x0000000000dce92f in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7f7ae6ffd700 (LWP 6245)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b001b21b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b001b2168, cond=0x7f7b001b2190) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f7b001b2190, mutex=0x7f7b001b2168) at pthread_cond_wait.c:655
#3  0x00007f7b16b890a1 in ?? () from /usr/share/manticore/modules/libgalera_manticore.so
#4  0x00007f7b16b7dc98 in gcs_core_recv(gcs_core*, gcs_act_rcvd*, long long) () from /usr/share/manticore/modules/libgalera_manticore.so
#5  0x00007f7b16b7868d in ?? () from /usr/share/manticore/modules/libgalera_manticore.so
#6  0x00007f7b17c646db in start_thread (arg=0x7f7ae6ffd700) at pthread_create.c:463
#7  0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 14 (Thread 0x7f7ae77fe700 (LWP 6244)):
#0  0x00007f7b1798d947 in epoll_wait (epfd=26, events=0x7f7ae77fd430, maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00007f7b16b57b02 in asio::detail::epoll_reactor::run(bool, asio::detail::op_queue<asio::detail::task_io_service_operation>&) () from /usr/share/manticore/modules/libgalera_manticore.so
#2  0x00007f7b16bdfea4 in gcomm::AsioProtonet::event_loop(gu::datetime::Period const&) () from /usr/share/manticore/modules/libgalera_manticore.so
#3  0x00007f7b16b883e7 in GCommConn::run() () from /usr/share/manticore/modules/libgalera_manticore.so
#4  0x00007f7b16b92799 in GCommConn::run_fn(void*) () from /usr/share/manticore/modules/libgalera_manticore.so
#5  0x00007f7b17c646db in start_thread (arg=0x7f7ae77fe700) at pthread_create.c:463
#6  0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 13 (Thread 0x7f7b15873700 (LWP 6243)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b00037c04) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b00037bb0, cond=0x7f7b00037bd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7f7b00037bd8, mutex=0x7f7b00037bb0) at pthread_cond_wait.c:655
#3  0x00007f7b16b3d967 in galera::ServiceThd::thd_func(void*) () from /usr/share/manticore/modules/libgalera_manticore.so
#4  0x00007f7b17c646db in start_thread (arg=0x7f7b15873700) at pthread_create.c:463
#5  0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 12 (Thread 0x7f7b19606700 (LWP 6242)):
#0  0x00007f7b1798d947 in epoll_wait (epfd=17, events=0x25cdee0, maxevents=6, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x0000000000414d74 in ?? ()
#2  0x00000000004665df in ?? ()
#3  0x0000000000465f9a in ?? ()
#4  0x00000000004dc4c5 in ?? ()
#5  0x00000000009613f9 in ?? ()
#6  0x00000000009611fe in ?? ()
#7  0x0000000000960fe5 in ?? ()
#8  0x000000000095fcd7 in ?? ()
#9  0x000000000095f1cc in ?? ()
#10 0x000000000095f2b4 in ?? ()
#11 0x00007f7b17c646db in start_thread (arg=0x7f7b19606700) at pthread_create.c:463
#12 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 11 (Thread 0x7f7b19627700 (LWP 6241)):
#0  0x00007f7b17c6b065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f7b19626d68, expected=0, futex_word=0x1309d10) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7f7b19626d68, mutex=0x1309d18, cond=0x1309ce8) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x1309ce8, mutex=0x1309d18, abstime=0x7f7b19626d68) at pthread_cond_wait.c:667
#3  0x00000000005a257e in ?? ()
#4  0x000000000042be6f in ?? ()
#5  0x000000000042bd4b in ?? ()
#6  0x000000000095f1cc in ?? ()
#7  0x000000000095f2b4 in ?? ()
#8  0x00007f7b17c646db in start_thread (arg=0x7f7b19627700) at pthread_create.c:463
#9  0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 10 (Thread 0x7f7b19648700 (LWP 6240)):
#0  0x00007f7b17c6b065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f7b19647d98, expected=0, futex_word=0x1309c54) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7f7b19647d98, mutex=0x1309c58, cond=0x1309c28) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x1309c28, mutex=0x1309c58, abstime=0x7f7b19647d98) at pthread_cond_wait.c:667
#3  0x00000000005a26ae in ?? ()
#4  0x000000000042b2d6 in ?? ()
#5  0x000000000042b238 in ?? ()
#6  0x000000000095f1cc in ?? ()
#7  0x000000000095f2b4 in ?? ()
#8  0x00007f7b17c646db in start_thread (arg=0x7f7b19648700) at pthread_create.c:463
#9  0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7f7b19669700 (LWP 6239)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b19669700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f7b1968a700 (LWP 6238)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b1968a700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f7b196ab700 (LWP 6237)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b196ab700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f7b196cc700 (LWP 6236)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b196cc700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f7b196ed700 (LWP 6235)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b196ed700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f7b1970e700 (LWP 6234)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b1970e700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f7b1972f700 (LWP 6233)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b1972f700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f7b19750700 (LWP 6232)):
#0  0x00007f7b17c6aad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2566b44) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2566ae8, cond=0x2566b18) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x2566b18, mutex=0x2566ae8) at pthread_cond_wait.c:655
#3  0x0000000000961341 in ?? ()
#4  0x00000000009611fe in ?? ()
#5  0x0000000000960fe5 in ?? ()
#6  0x000000000095fcd7 in ?? ()
#7  0x000000000095f1cc in ?? ()
#8  0x000000000095f2b4 in ?? ()
#9  0x00007f7b17c646db in start_thread (arg=0x7f7b19750700) at pthread_create.c:463
#10 0x00007f7b1798d61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f7b19751800 (LWP 6231)):
#0  0x00007f7b17982d1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffff9c74438) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00000000004f51ef in ?? ()
#2  0x00000000004cbce7 in ?? ()
#3  0x00000000004d1e41 in ?? ()
#4  0x00000000004d2d36 in ?? ()
#5  0x00007f7b1788dc87 in __libc_start_main (main=0x4d2cf0, argc=3, argv=0x7ffff9c75038, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffff9c75028) at ../csu/libc-start.c:310
#6  0x000000000040a95a in ?? ()
(gdb)
tomatolog commented 1 year ago

could you check your fgi_stage index with indextool to make sure it is valid?

pavelnemirovsky commented 1 year ago

@tomatolog Sure

CREATE TABLE fgi_stage (
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json,
article_body_hash text stored,
content text indexed
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/data/manticore/fgi_stage/en' rt_mem_limit='2147483648
oot@ip-10-0-82-53:/var/lib/data/manticore# indextool --check fgi_stage
Manticore 5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
WARNING: index fgi_stage: index 'fgi_stage': morphology option changed from config has no effect, ignoring
checking index 'fgi_stage'...
checking schema...
checking RT segment 0(25)...
checking rows...
checking dead row map...
checking RT segment 1(25)...
checking rows...
checking dead row map...
checking RT segment 2(25)...
checking rows...
checking dead row map...
checking RT segment 3(25)...
checking rows...
checking dead row map...
checking RT segment 4(25)...
checking rows...
checking dead row map...
checking RT segment 5(25)...
checking rows...
checking dead row map...
checking RT segment 6(25)...
checking rows...
checking dead row map...
checking RT segment 7(25)...
checking rows...
checking dead row map...
checking RT segment 8(25)...
checking rows...
checking dead row map...
checking RT segment 9(25)...
checking rows...
checking dead row map...
checking RT segment 10(25)...
checking rows...
checking dead row map...
checking RT segment 11(25)...
checking rows...
checking dead row map...
checking RT segment 12(25)...
checking rows...
checking dead row map...
checking RT segment 13(25)...
checking rows...
checking dead row map...
checking RT segment 14(25)...
checking rows...
checking dead row map...
checking RT segment 15(25)...
checking rows...
checking dead row map...
checking RT segment 16(25)...
checking rows...
checking dead row map...
checking RT segment 17(25)...
checking rows...
checking dead row map...
checking RT segment 18(25)...
checking rows...
checking dead row map...
checking RT segment 19(25)...
checking rows...
checking dead row map...
checking RT segment 20(25)...
checking rows...
checking dead row map...
checking RT segment 21(25)...
checking rows...
checking dead row map...
checking RT segment 22(25)...
checking rows...
checking dead row map...
checking RT segment 23(25)...
checking rows...
checking dead row map...
checking RT segment 24(25)...
checking rows...
checking dead row map...
checking disk chunk, extension 0, 0(1)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
FAILED, unexpected attribute value (row=126, attr=0, docid=512, block=0, value=0x512, min=0x0, max=0x508)
FAILED, unexpected attribute value (row=126, attr=2, docid=512, block=0, value=0x512, min=0x0, max=0x508)
FAILED, unexpected attribute value (row=127, attr=0, docid=516, block=0, value=0x516, min=0x0, max=0x508)
FAILED, unexpected attribute value (row=127, attr=2, docid=516, block=0, value=0x516, min=0x0, max=0x508)
FAILED, unexpected attribute value (row=254, attr=0, docid=1024, block=1, value=0x1024, min=0x512, max=0x1020)
FAILED, unexpected attribute value (row=254, attr=2, docid=1024, block=1, value=0x1024, min=0x512, max=0x1020)
FAILED, unexpected attribute value (row=255, attr=0, docid=1028, block=1, value=0x1028, min=0x512, max=0x1020)
FAILED, unexpected attribute value (row=255, attr=2, docid=1028, block=1, value=0x1028, min=0x512, max=0x1020)
FAILED, unexpected attribute value (row=382, attr=0, docid=1536, block=2, value=0x1536, min=0x1024, max=0x1532)
FAILED, unexpected attribute value (row=382, attr=2, docid=1536, block=2, value=0x1536, min=0x1024, max=0x1532)
FAILED, unexpected attribute value (row=383, attr=0, docid=1540, block=2, value=0x1540, min=0x1024, max=0x1532)
FAILED, unexpected attribute value (row=383, attr=2, docid=1540, block=2, value=0x1540, min=0x1024, max=0x1532)
FAILED, unexpected attribute value (row=510, attr=0, docid=2048, block=3, value=0x2048, min=0x1536, max=0x2044)
FAILED, unexpected attribute value (row=510, attr=2, docid=2048, block=3, value=0x2048, min=0x1536, max=0x2044)
FAILED, unexpected attribute value (row=511, attr=0, docid=2052, block=3, value=0x2052, min=0x1536, max=0x2044)
FAILED, unexpected attribute value (row=511, attr=2, docid=2052, block=3, value=0x2052, min=0x1536, max=0x2044)
FAILED, unexpected attribute value (row=638, attr=0, docid=2560, block=4, value=0x2560, min=0x2048, max=0x2556)
FAILED, unexpected attribute value (row=638, attr=2, docid=2560, block=4, value=0x2560, min=0x2048, max=0x2556)
FAILED, unexpected attribute value (row=639, attr=0, docid=2564, block=4, value=0x2564, min=0x2048, max=0x2556)
FAILED, unexpected attribute value (row=639, attr=2, docid=2564, block=4, value=0x2564, min=0x2048, max=0x2556)
FAILED, unexpected attribute value (row=766, attr=0, docid=3072, block=5, value=0x3072, min=0x2560, max=0x3068)
FAILED, unexpected attribute value (row=766, attr=2, docid=3072, block=5, value=0x3072, min=0x2560, max=0x3068)
FAILED, unexpected attribute value (row=767, attr=0, docid=3076, block=5, value=0x3076, min=0x2560, max=0x3068)
FAILED, unexpected attribute value (row=767, attr=2, docid=3076, block=5, value=0x3076, min=0x2560, max=0x3068)
FAILED, unexpected attribute value (row=894, attr=0, docid=3584, block=6, value=0x3584, min=0x3072, max=0x3580)
FAILED, unexpected attribute value (row=894, attr=2, docid=3584, block=6, value=0x3584, min=0x3072, max=0x3580)
FAILED, unexpected attribute value (row=895, attr=0, docid=3588, block=6, value=0x3588, min=0x3072, max=0x3580)
FAILED, unexpected attribute value (row=895, attr=2, docid=3588, block=6, value=0x3588, min=0x3072, max=0x3580)
FAILED, unexpected attribute value (row=1022, attr=0, docid=4096, block=7, value=0x4096, min=0x3584, max=0x4092)
FAILED, unexpected attribute value (row=1022, attr=2, docid=4096, block=7, value=0x4096, min=0x3584, max=0x4092)
FAILED, unexpected attribute value (row=1023, attr=0, docid=4100, block=7, value=0x4100, min=0x3584, max=0x4092)
FAILED, unexpected attribute value (row=1023, attr=2, docid=4100, block=7, value=0x4100, min=0x3584, max=0x4092)
FAILED, unexpected attribute value (row=1150, attr=0, docid=4608, block=8, value=0x4608, min=0x4096, max=0x4604)
FAILED, unexpected attribute value (row=1150, attr=2, docid=4608, block=8, value=0x4608, min=0x4096, max=0x4604)
FAILED, unexpected attribute value (row=1151, attr=0, docid=4612, block=8, value=0x4612, min=0x4096, max=0x4604)
FAILED, unexpected attribute value (row=1151, attr=2, docid=4612, block=8, value=0x4612, min=0x4096, max=0x4604)
FAILED, unexpected attribute value (row=1278, attr=0, docid=5120, block=9, value=0x5120, min=0x4608, max=0x5116)
FAILED, unexpected attribute value (row=1278, attr=2, docid=5120, block=9, value=0x5120, min=0x4608, max=0x5116)
FAILED, unexpected attribute value (row=1279, attr=0, docid=5124, block=9, value=0x5124, min=0x4608, max=0x5116)
FAILED, unexpected attribute value (row=1279, attr=2, docid=5124, block=9, value=0x5124, min=0x4608, max=0x5116)
FAILED, unexpected attribute value (row=1406, attr=0, docid=5632, block=10, value=0x5632, min=0x5120, max=0x5628)
FAILED, unexpected attribute value (row=1406, attr=2, docid=5632, block=10, value=0x5632, min=0x5120, max=0x5628)
FAILED, unexpected attribute value (row=1407, attr=0, docid=5636, block=10, value=0x5636, min=0x5120, max=0x5628)
FAILED, unexpected attribute value (row=1407, attr=2, docid=5636, block=10, value=0x5636, min=0x5120, max=0x5628)
FAILED, unexpected attribute value (row=1534, attr=0, docid=6144, block=11, value=0x6144, min=0x5632, max=0x6140)
FAILED, unexpected attribute value (row=1534, attr=2, docid=6144, block=11, value=0x6144, min=0x5632, max=0x6140)
FAILED, unexpected attribute value (row=1535, attr=0, docid=6148, block=11, value=0x6148, min=0x5632, max=0x6140)
FAILED, unexpected attribute value (row=1535, attr=2, docid=6148, block=11, value=0x6148, min=0x5632, max=0x6140)
FAILED, unexpected attribute value (row=1662, attr=0, docid=6656, block=12, value=0x6656, min=0x6144, max=0x6652)
FAILED, unexpected attribute value (row=1662, attr=2, docid=6656, block=12, value=0x6656, min=0x6144, max=0x6652)
FAILED, unexpected attribute value (row=1663, attr=0, docid=6660, block=12, value=0x6660, min=0x6144, max=0x6652)
FAILED, unexpected attribute value (row=1663, attr=2, docid=6660, block=12, value=0x6660, min=0x6144, max=0x6652)
FAILED, unexpected attribute value (row=1790, attr=0, docid=7168, block=13, value=0x7168, min=0x6656, max=0x7164)
FAILED, unexpected attribute value (row=1790, attr=2, docid=7168, block=13, value=0x7168, min=0x6656, max=0x7164)
FAILED, unexpected attribute value (row=1791, attr=0, docid=7172, block=13, value=0x7172, min=0x6656, max=0x7164)
FAILED, unexpected attribute value (row=1791, attr=2, docid=7172, block=13, value=0x7172, min=0x6656, max=0x7164)
FAILED, unexpected attribute value (row=1918, attr=0, docid=7680, block=14, value=0x7680, min=0x7168, max=0x7676)
FAILED, unexpected attribute value (row=1918, attr=2, docid=7680, block=14, value=0x7680, min=0x7168, max=0x7676)
FAILED, unexpected attribute value (row=1919, attr=0, docid=7684, block=14, value=0x7684, min=0x7168, max=0x7676)
FAILED, unexpected attribute value (row=1919, attr=2, docid=7684, block=14, value=0x7684, min=0x7168, max=0x7676)
FAILED, unexpected attribute value (row=2046, attr=0, docid=8192, block=15, value=0x8192, min=0x7680, max=0x8188)
FAILED, unexpected attribute value (row=2046, attr=2, docid=8192, block=15, value=0x8192, min=0x7680, max=0x8188)
FAILED, unexpected attribute value (row=2047, attr=0, docid=8196, block=15, value=0x8196, min=0x7680, max=0x8188)
FAILED, unexpected attribute value (row=2047, attr=2, docid=8196, block=15, value=0x8196, min=0x7680, max=0x8188)
FAILED, unexpected attribute value (row=2174, attr=0, docid=8704, block=16, value=0x8704, min=0x8192, max=0x8700)
FAILED, unexpected attribute value (row=2174, attr=2, docid=8704, block=16, value=0x8704, min=0x8192, max=0x8700)
FAILED, unexpected attribute value (row=2175, attr=0, docid=8708, block=16, value=0x8708, min=0x8192, max=0x8700)
FAILED, unexpected attribute value (row=2175, attr=2, docid=8708, block=16, value=0x8708, min=0x8192, max=0x8700)
FAILED, unexpected attribute value (row=2302, attr=0, docid=9216, block=17, value=0x9216, min=0x8704, max=0x9212)
FAILED, unexpected attribute value (row=2302, attr=2, docid=9216, block=17, value=0x9216, min=0x8704, max=0x9212)
FAILED, unexpected attribute value (row=2303, attr=0, docid=9220, block=17, value=0x9220, min=0x8704, max=0x9212)
FAILED, unexpected attribute value (row=2303, attr=2, docid=9220, block=17, value=0x9220, min=0x8704, max=0x9212)
FAILED, unexpected attribute value (row=2430, attr=0, docid=9728, block=18, value=0x9728, min=0x9216, max=0x9724)
FAILED, unexpected attribute value (row=2430, attr=2, docid=9728, block=18, value=0x9728, min=0x9216, max=0x9724)
FAILED, unexpected attribute value (row=2431, attr=0, docid=9732, block=18, value=0x9732, min=0x9216, max=0x9724)
FAILED, unexpected attribute value (row=2431, attr=2, docid=9732, block=18, value=0x9732, min=0x9216, max=0x9724)
FAILED, unexpected attribute value (row=2558, attr=0, docid=10240, block=19, value=0x10240, min=0x9728, max=0x10236)
FAILED, unexpected attribute value (row=2558, attr=2, docid=10240, block=19, value=0x10240, min=0x9728, max=0x10236)
FAILED, unexpected attribute value (row=2559, attr=0, docid=10244, block=19, value=0x10244, min=0x9728, max=0x10236)
FAILED, unexpected attribute value (row=2559, attr=2, docid=10244, block=19, value=0x10244, min=0x9728, max=0x10236)
FAILED, unexpected attribute value (row=2686, attr=0, docid=10752, block=20, value=0x10752, min=0x10240, max=0x10748)
FAILED, unexpected attribute value (row=2686, attr=2, docid=10752, block=20, value=0x10752, min=0x10240, max=0x10748)
FAILED, unexpected attribute value (row=2687, attr=0, docid=10756, block=20, value=0x10756, min=0x10240, max=0x10748)
FAILED, unexpected attribute value (row=2687, attr=2, docid=10756, block=20, value=0x10756, min=0x10240, max=0x10748)
FAILED, unexpected attribute value (row=2814, attr=0, docid=11264, block=21, value=0x11264, min=0x10752, max=0x11260)
FAILED, unexpected attribute value (row=2814, attr=2, docid=11264, block=21, value=0x11264, min=0x10752, max=0x11260)
FAILED, unexpected attribute value (row=2815, attr=0, docid=11268, block=21, value=0x11268, min=0x10752, max=0x11260)
FAILED, unexpected attribute value (row=2815, attr=2, docid=11268, block=21, value=0x11268, min=0x10752, max=0x11260)
FAILED, unexpected attribute value (row=2942, attr=0, docid=11776, block=22, value=0x11776, min=0x11264, max=0x11772)
FAILED, unexpected attribute value (row=2942, attr=2, docid=11776, block=22, value=0x11776, min=0x11264, max=0x11772)
FAILED, unexpected attribute value (row=2943, attr=0, docid=11780, block=22, value=0x11780, min=0x11264, max=0x11772)
FAILED, unexpected attribute value (row=2943, attr=2, docid=11780, block=22, value=0x11780, min=0x11264, max=0x11772)
FAILED, unexpected attribute value (row=3070, attr=0, docid=12288, block=23, value=0x12288, min=0x11776, max=0x12284)
FAILED, unexpected attribute value (row=3070, attr=2, docid=12288, block=23, value=0x12288, min=0x11776, max=0x12284)
FAILED, unexpected attribute value (row=3071, attr=0, docid=12292, block=23, value=0x12292, min=0x11776, max=0x12284)
FAILED, unexpected attribute value (row=3071, attr=2, docid=12292, block=23, value=0x12292, min=0x11776, max=0x12284)
FAILED, unexpected attribute value (row=3198, attr=0, docid=12800, block=24, value=0x12800, min=0x12288, max=0x12796)
FAILED, unexpected attribute value (row=3198, attr=2, docid=12800, block=24, value=0x12800, min=0x12288, max=0x12796)
checking columnar storage...
    checking attribute 'id'...
    checked 97102/97102 docs
    ok
    checking attribute 'publish_date'...
    checked 97102/97102 docs
    ok
checking kill-list...ibute '$internal_id_HASH'...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 187706 failures reported, 109.1 sec elapsed
check FAILED, 99 of 187706 failures reported, 109.1 sec elapsed
pavelnemirovsky commented 1 year ago

Checked other big indices as well... the issue is the same ...

root@ip-10-0-82-53:/var/crash# indextool --check fgi_dev
Manticore 5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
WARNING: index fgi_dev: index 'fgi_dev': morphology option changed from config has no effect, ignoring
checking index 'fgi_dev'...
WARNING: failed to load RAM chunks, checking only 1 disk chunks
checking schema...
checking RT segment 0(26)...
checking rows...
checking dead row map...
checking RT segment 1(26)...
checking rows...
checking dead row map...
checking RT segment 2(26)...
checking rows...
checking dead row map...
checking RT segment 3(26)...
checking rows...
checking dead row map...
checking RT segment 4(26)...
checking rows...
checking dead row map...
checking RT segment 5(26)...
checking rows...
checking dead row map...
checking RT segment 6(26)...
checking rows...
checking dead row map...
checking RT segment 7(26)...
checking rows...
checking dead row map...
checking RT segment 8(26)...
checking rows...
checking dead row map...
checking RT segment 9(26)...
checking rows...
checking dead row map...
checking RT segment 10(26)...
checking rows...
checking dead row map...
checking RT segment 11(26)...
checking rows...
checking dead row map...
checking RT segment 12(26)...
checking rows...
checking dead row map...
checking RT segment 13(26)...
checking rows...
checking dead row map...
checking RT segment 14(26)...
checking rows...
checking dead row map...
checking RT segment 15(26)...
checking rows...
checking dead row map...
checking RT segment 16(26)...
checking rows...
checking dead row map...
checking RT segment 17(26)...
checking rows...
checking dead row map...
checking RT segment 18(26)...
checking rows...
checking dead row map...
checking RT segment 19(26)...
checking rows...
checking dead row map...
checking RT segment 20(26)...
checking rows...
checking dead row map...
checking RT segment 21(26)...
checking rows...
checking dead row map...
checking RT segment 22(26)...
checking rows...
checking dead row map...
checking RT segment 23(26)...
checking rows...
checking dead row map...
checking RT segment 24(26)...
checking rows...
checking dead row map...
checking RT segment 25(26)...
checking rows...
checking dead row map...
checking disk chunk, extension 0, 0(1)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
FAILED, unexpected attribute value (row=127, attr=0, docid=186263, block=0, value=0x186263, min=0x0, max=0x186255)
FAILED, unexpected attribute value (row=127, attr=2, docid=186263, block=0, value=0x186263, min=0x0, max=0x186255)
FAILED, unexpected attribute value (row=255, attr=0, docid=365824, block=1, value=0x365824, min=0x187516, max=0x365816)
FAILED, unexpected attribute value (row=255, attr=2, docid=365824, block=1, value=0x365824, min=0x187516, max=0x365816)
FAILED, unexpected attribute value (row=383, attr=0, docid=548659, block=2, value=0x548659, min=0x367237, max=0x548651)
FAILED, unexpected attribute value (row=383, attr=2, docid=548659, block=2, value=0x548659, min=0x367237, max=0x548651)
FAILED, unexpected attribute value (row=511, attr=0, docid=726038, block=3, value=0x726038, min=0x550290, max=0x726030)
FAILED, unexpected attribute value (row=511, attr=2, docid=726038, block=3, value=0x726038, min=0x550290, max=0x726030)
FAILED, unexpected attribute value (row=639, attr=0, docid=903976, block=4, value=0x903976, min=0x727508, max=0x903968)
FAILED, unexpected attribute value (row=639, attr=2, docid=903976, block=4, value=0x903976, min=0x727508, max=0x903968)
FAILED, unexpected attribute value (row=767, attr=0, docid=1081197, block=5, value=0x1081197, min=0x905264, max=0x1081189)
FAILED, unexpected attribute value (row=767, attr=2, docid=1081197, block=5, value=0x1081197, min=0x905264, max=0x1081189)
FAILED, unexpected attribute value (row=895, attr=0, docid=1256710, block=6, value=0x1256710, min=0x1082555, max=0x1256702)
FAILED, unexpected attribute value (row=895, attr=2, docid=1256710, block=6, value=0x1256710, min=0x1082555, max=0x1256702)
FAILED, unexpected attribute value (row=1023, attr=0, docid=1433152, block=7, value=0x1433152, min=0x1258105, max=0x1433144)
FAILED, unexpected attribute value (row=1023, attr=2, docid=1433152, block=7, value=0x1433152, min=0x1258105, max=0x1433144)
FAILED, unexpected attribute value (row=1151, attr=0, docid=1620274, block=8, value=0x1620274, min=0x1434665, max=0x1620266)
FAILED, unexpected attribute value (row=1151, attr=2, docid=1620274, block=8, value=0x1620274, min=0x1434665, max=0x1620266)
FAILED, unexpected attribute value (row=1279, attr=0, docid=1800205, block=9, value=0x1800205, min=0x1621738, max=0x1800197)
FAILED, unexpected attribute value (row=1279, attr=2, docid=1800205, block=9, value=0x1800205, min=0x1621738, max=0x1800197)
FAILED, unexpected attribute value (row=1407, attr=0, docid=1979492, block=10, value=0x1979492, min=0x1801430, max=0x1979484)
FAILED, unexpected attribute value (row=1407, attr=2, docid=1979492, block=10, value=0x1979492, min=0x1801430, max=0x1979484)
FAILED, unexpected attribute value (row=1535, attr=0, docid=2157893, block=11, value=0x2157893, min=0x1980787, max=0x2157885)
FAILED, unexpected attribute value (row=1535, attr=2, docid=2157893, block=11, value=0x2157893, min=0x1980787, max=0x2157885)
FAILED, unexpected attribute value (row=1663, attr=0, docid=2334902, block=12, value=0x2334902, min=0x2159330, max=0x2334894)
FAILED, unexpected attribute value (row=1663, attr=2, docid=2334902, block=12, value=0x2334902, min=0x2159330, max=0x2334894)
FAILED, unexpected attribute value (row=1791, attr=0, docid=2522963, block=13, value=0x2522963, min=0x2336436, max=0x2522955)
FAILED, unexpected attribute value (row=1791, attr=2, docid=2522963, block=13, value=0x2522963, min=0x2336436, max=0x2522955)
FAILED, unexpected attribute value (row=1919, attr=0, docid=2710051, block=14, value=0x2710051, min=0x2524360, max=0x2710043)
FAILED, unexpected attribute value (row=1919, attr=2, docid=2710051, block=14, value=0x2710051, min=0x2524360, max=0x2710043)
FAILED, unexpected attribute value (row=2047, attr=0, docid=2886548, block=15, value=0x2886548, min=0x2711308, max=0x2886540)
FAILED, unexpected attribute value (row=2047, attr=2, docid=2886548, block=15, value=0x2886548, min=0x2711308, max=0x2886540)
FAILED, unexpected attribute value (row=2175, attr=0, docid=3064775, block=16, value=0x3064775, min=0x2888049, max=0x3064767)
FAILED, unexpected attribute value (row=2175, attr=2, docid=3064775, block=16, value=0x3064775, min=0x2888049, max=0x3064767)
FAILED, unexpected attribute value (row=2287, attr=0, docid=3226416, block=17, value=0x3226416, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2287, attr=2, docid=3226416, block=17, value=0x3226416, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2288, attr=0, docid=3228161, block=17, value=0x3228161, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2288, attr=2, docid=3228161, block=17, value=0x3228161, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2289, attr=0, docid=3229563, block=17, value=0x3229563, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2289, attr=2, docid=3229563, block=17, value=0x3229563, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2290, attr=0, docid=3230900, block=17, value=0x3230900, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2290, attr=2, docid=3230900, block=17, value=0x3230900, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2291, attr=0, docid=3232284, block=17, value=0x3232284, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2291, attr=2, docid=3232284, block=17, value=0x3232284, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2292, attr=0, docid=3233677, block=17, value=0x3233677, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2292, attr=2, docid=3233677, block=17, value=0x3233677, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2293, attr=0, docid=3235156, block=17, value=0x3235156, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2293, attr=2, docid=3235156, block=17, value=0x3235156, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2294, attr=0, docid=3236577, block=17, value=0x3236577, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2294, attr=2, docid=3236577, block=17, value=0x3236577, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2295, attr=0, docid=3238181, block=17, value=0x3238181, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2295, attr=2, docid=3238181, block=17, value=0x3238181, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2296, attr=0, docid=3239516, block=17, value=0x3239516, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2296, attr=2, docid=3239516, block=17, value=0x3239516, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2297, attr=0, docid=3241054, block=17, value=0x3241054, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2297, attr=2, docid=3241054, block=17, value=0x3241054, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2298, attr=0, docid=3242376, block=17, value=0x3242376, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2298, attr=2, docid=3242376, block=17, value=0x3242376, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2299, attr=0, docid=3243734, block=17, value=0x3243734, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2299, attr=2, docid=3243734, block=17, value=0x3243734, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2300, attr=0, docid=3245328, block=17, value=0x3245328, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2300, attr=2, docid=3245328, block=17, value=0x3245328, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2301, attr=0, docid=3246683, block=17, value=0x3246683, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2301, attr=2, docid=3246683, block=17, value=0x3246683, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2302, attr=0, docid=3248260, block=17, value=0x3248260, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2302, attr=2, docid=3248260, block=17, value=0x3248260, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2303, attr=0, docid=3249861, block=17, value=0x3249861, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2303, attr=2, docid=3249861, block=17, value=0x3249861, min=0x0, max=0x3226408)
FAILED, unexpected attribute value (row=2304, attr=0, docid=3251226, block=18, value=0x3251226, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2304, attr=2, docid=3251226, block=18, value=0x3251226, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2305, attr=0, docid=3252634, block=18, value=0x3252634, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2305, attr=2, docid=3252634, block=18, value=0x3252634, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2306, attr=0, docid=3253884, block=18, value=0x3253884, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2306, attr=2, docid=3253884, block=18, value=0x3253884, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2307, attr=0, docid=3255637, block=18, value=0x3255637, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2307, attr=2, docid=3255637, block=18, value=0x3255637, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2308, attr=0, docid=3257214, block=18, value=0x3257214, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2308, attr=2, docid=3257214, block=18, value=0x3257214, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2309, attr=0, docid=3258366, block=18, value=0x3258366, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2309, attr=2, docid=3258366, block=18, value=0x3258366, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2310, attr=0, docid=3259886, block=18, value=0x3259886, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2310, attr=2, docid=3259886, block=18, value=0x3259886, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2311, attr=0, docid=3261345, block=18, value=0x3261345, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2311, attr=2, docid=3261345, block=18, value=0x3261345, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2312, attr=0, docid=3262891, block=18, value=0x3262891, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2312, attr=2, docid=3262891, block=18, value=0x3262891, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2313, attr=0, docid=3264347, block=18, value=0x3264347, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2313, attr=2, docid=3264347, block=18, value=0x3264347, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2314, attr=0, docid=3265750, block=18, value=0x3265750, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2314, attr=2, docid=3265750, block=18, value=0x3265750, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2315, attr=0, docid=3267012, block=18, value=0x3267012, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2315, attr=2, docid=3267012, block=18, value=0x3267012, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2316, attr=0, docid=3268269, block=18, value=0x3268269, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2316, attr=2, docid=3268269, block=18, value=0x3268269, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2317, attr=0, docid=3269860, block=18, value=0x3269860, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2317, attr=2, docid=3269860, block=18, value=0x3269860, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2318, attr=0, docid=3271180, block=18, value=0x3271180, min=0x23065, max=0x206010)
FAILED, unexpected attribute value (row=2318, attr=2, docid=3271180, block=18, value=0x3271180, min=0x23065, max=0x206010)
checking columnar storage...
    checking attribute 'id'...
    checked 76653/76653 docs
    ok
    checking attribute 'publish_date'...
    checked 76653/76653 docs
    ok
checking kill-list...ibute '$internal_id_HASH'...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 148768 failures reported, 93.2 sec elapsed
check FAILED, 99 of 148768 failures reported, 93.2 sec elapsed
root@ip-10-0-82-53:/var/crash#
root@ip-10-0-82-53:/var/crash# mysql -P9306 -h0
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.0.37 git branch HEAD (no branch)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from fgi_dev where internal_id='80ecccc47e53428412f6fadb48b9e14d7eb194b0';
ERROR 2013 (HY000): Lost connection to MySQL server during query
pavelnemirovsky commented 1 year ago

I can confirm above queries were working just fine in Manticore 4.0.2

tomatolog commented 1 year ago

in case index is invalid it could be better to truncate its data and reindex it from scratch

pavelnemirovsky commented 1 year ago

@sanikolaev I am hopeless I can't upload Crashdump

image
pavelnemirovsky commented 1 year ago

in case index is invalid it could be better to truncate its data and reindex it from scratch

@tomatolog I feel that corruption occurs due to replace the operation we are doing (syncing doc in increment fashion) @tomatolog any thoughts on why nodes are crashing while recovering process? Maybe bin logs max size or something ?

pavelnemirovsky commented 1 year ago

@tomatolog how I should understand what's wrong in terms of row # based on the below information? May I see something visually what's wrong there?

FAILED, unexpected attribute value (row=2315, attr=2, docid=3267012, block=18, value=0x3267012, min=0x23065, max=0x206010)
pavelnemirovsky commented 1 year ago

in case index is invalid it could be better to truncate its data and reindex it from scratch

@tomatolog I have very good new our Spark code consistently brings testing index in corrupted mode and inside of that index are about 109k documents. I need your help to find malformed docs so it will give me some hints on where to look for issue.

tomatolog commented 1 year ago

FAILED, unexpected attribute value means that attribute value is outside of bound for the block index that is wrong and means that data is inconsistent

What do you mean for ?

any thoughts on why nodes are crashing while recovering process?

pavelnemirovsky commented 1 year ago

@tomatolog I am very close to reproduce the issue, I hope by EOD I will be able to provide the details and it is related to the index definition.

I reported two problems in this ticket, second problem relates to instability of nodes while recovering process of cluster running:

image
pavelnemirovsky commented 1 year ago

FAILED, unexpected attribute value means that attribute value is outside of bound for the block index that is wrong and means that data is inconsistent

Why you don't fail an insert request if so? How is possible to find what causes the error?

githubmanticore commented 1 year ago

➤ Aleksey N. Vinogradov commented:

That is so-called 'min-max index'. On indexing we create inverted index of ft-fields, which maps a word (token) into concrete doc which includes this word. And also we have a storage where all the docs with the rest (non-ft) attributes stored. First (inverted index, aka dictionary) is used in ft-search (that is, where match ('....')). Second used in full-scan, when we can't use ft-dictionary. Say, if you queried something like where int_attr<1000.

For ft we have nothing, but dictionary.

For full-scan we have to iterate over all documents. And that is much slower, as we can literally step over gigabytes of disk-stored data.

Each added document has a storage, where all its attributes stored. If you have fixed-sized attributes in an index (that is - non-string, non-json, but plain int/int64/float), they're stored 'as is'. Simple integer occupied 4 bytes, float - also 4 bytes, double/int64 - 8 bytes, etc. That is fixed block. If you have several ft-fields, and 6 ints - it will take 6*32 = 248 bytes per document (plus 8 bytes for DocID = 256 bytes).

In order to improve iteration over documents with full-scan we have so-called 'min-max index'. That is - over the all documents stored, we additionally save statistic about blocks - that is, for each set of documents, starting from the beginning, and taking 128 docs per block, we save, 1-st, 'pseudo-doc' with overall minimal values of all integral arguments, and then 'pseudo-doc' with maximal values over the same. And, finally, we save the same pair for the whole index, despite it's size.

When you issue a full-scan query (that is - without the 'where match...' clause), which expects something like 'where budget < 1000000' - we first go to the index-wide block, and if it says, that minimal budget is > 1000000 - we immediately reject whole query with 'no documents found' message.

In opposite, we start to step over 'blocks', and check minimal edge of each. All which doesn't include min < 1000000 will be immediately rejected.

We finish with the blocks (128 docs each) which probably match you query. And so, full-scan query might be executed much faster (up to 128 times) then real full-scan.

All mentioned is about checking this final block of pairs for blocks/whole index. If checking said,

FAILED, unexpected attribute value (row=2315, attr=2, docid=3267012, block=18, value=0x3267012, min=0x23065, max=0x206010)  

That means, that value expected between min and max there. It doesn't fit, and there we cant' say anything definite - whether final block of 'min-max' is wrong, or may-be value is stored wrong. But since we store an index sequentaly, keeping it's structure, we can only say, that index is damaged.

pavelnemirovsky commented 1 year ago

Guys,

Here is the script that reproduces index misbehavior

Step to reproduce:

./manticore-bulk-generator.py --mc http://10.0.82.53:9308 --index test1234 
indextool --check test1234

-------
root@ip-10-0-82-53:~# indextool --check test1234 | more
Manticore 5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
checking index 'test1234'...
WARNING: failed to load RAM chunks, checking only 1 disk chunks
checking schema...
checking disk chunk, extension 0, 0(1)...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
FAILED, unexpected attribute value (row=127, attr=0, docid=5977, block=0, value=0x5977, min=0x0, max=0x5969)
FAILED, unexpected attribute value (row=127, attr=2, docid=5977, block=0, value=0x5977, min=0x0, max=0x5969)
FAILED, unexpected attribute value (row=255, attr=0, docid=11993, block=1, value=0x11993, min=0x6016, max=0x11985)
FAILED, unexpected attribute value (row=255, attr=2, docid=11993, block=1, value=0x11993, min=0x6016, max=0x11985)
FAILED, unexpected attribute value (row=299, attr=0, docid=14061, block=2, value=0x14061, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=299, attr=2, docid=14061, block=2, value=0x14061, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=300, attr=0, docid=14108, block=2, value=0x14108, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=300, attr=2, docid=14108, block=2, value=0x14108, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=301, attr=0, docid=14155, block=2, value=0x14155, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=301, attr=2, docid=14155, block=2, value=0x14155, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=302, attr=0, docid=14202, block=2, value=0x14202, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=302, attr=2, docid=14202, block=2, value=0x14202, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=303, attr=0, docid=14249, block=2, value=0x14249, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=303, attr=2, docid=14249, block=2, value=0x14249, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=304, attr=0, docid=14296, block=2, value=0x14296, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=304, attr=2, docid=14296, block=2, value=0x14296, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=305, attr=0, docid=14343, block=2, value=0x14343, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=305, attr=2, docid=14343, block=2, value=0x14343, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=306, attr=0, docid=14390, block=2, value=0x14390, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=306, attr=2, docid=14390, block=2, value=0x14390, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=307, attr=0, docid=14437, block=2, value=0x14437, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=307, attr=2, docid=14437, block=2, value=0x14437, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=308, attr=0, docid=14484, block=2, value=0x14484, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=308, attr=2, docid=14484, block=2, value=0x14484, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=309, attr=0, docid=14531, block=2, value=0x14531, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=309, attr=2, docid=14531, block=2, value=0x14531, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=310, attr=0, docid=14578, block=2, value=0x14578, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=310, attr=2, docid=14578, block=2, value=0x14578, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=311, attr=0, docid=14625, block=2, value=0x14625, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=311, attr=2, docid=14625, block=2, value=0x14625, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=312, attr=0, docid=14672, block=2, value=0x14672, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=312, attr=2, docid=14672, block=2, value=0x14672, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=313, attr=0, docid=14719, block=2, value=0x14719, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=313, attr=2, docid=14719, block=2, value=0x14719, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=314, attr=0, docid=14766, block=2, value=0x14766, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=314, attr=2, docid=14766, block=2, value=0x14766, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=315, attr=0, docid=14813, block=2, value=0x14813, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=315, attr=2, docid=14813, block=2, value=0x14813, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=316, attr=0, docid=14860, block=2, value=0x14860, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=316, attr=2, docid=14860, block=2, value=0x14860, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=317, attr=0, docid=14907, block=2, value=0x14907, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=317, attr=2, docid=14907, block=2, value=0x14907, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=318, attr=0, docid=14954, block=2, value=0x14954, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=318, attr=2, docid=14954, block=2, value=0x14954, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=319, attr=0, docid=15001, block=2, value=0x15001, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=319, attr=2, docid=15001, block=2, value=0x15001, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=320, attr=0, docid=15048, block=2, value=0x15048, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=320, attr=2, docid=15048, block=2, value=0x15048, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=321, attr=0, docid=15095, block=2, value=0x15095, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=321, attr=2, docid=15095, block=2, value=0x15095, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=322, attr=0, docid=15142, block=2, value=0x15142, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=322, attr=2, docid=15142, block=2, value=0x15142, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=323, attr=0, docid=15189, block=2, value=0x15189, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=323, attr=2, docid=15189, block=2, value=0x15189, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=324, attr=0, docid=15236, block=2, value=0x15236, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=324, attr=2, docid=15236, block=2, value=0x15236, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=325, attr=0, docid=15283, block=2, value=0x15283, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=325, attr=2, docid=15283, block=2, value=0x15283, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=326, attr=0, docid=15330, block=2, value=0x15330, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=326, attr=2, docid=15330, block=2, value=0x15330, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=327, attr=0, docid=15377, block=2, value=0x15377, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=327, attr=2, docid=15377, block=2, value=0x15377, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=328, attr=0, docid=15424, block=2, value=0x15424, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=328, attr=2, docid=15424, block=2, value=0x15424, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=329, attr=0, docid=15471, block=2, value=0x15471, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=329, attr=2, docid=15471, block=2, value=0x15471, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=330, attr=0, docid=15518, block=2, value=0x15518, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=330, attr=2, docid=15518, block=2, value=0x15518, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=331, attr=0, docid=15565, block=2, value=0x15565, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=331, attr=2, docid=15565, block=2, value=0x15565, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=332, attr=0, docid=15612, block=2, value=0x15612, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=332, attr=2, docid=15612, block=2, value=0x15612, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=333, attr=0, docid=15659, block=2, value=0x15659, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=333, attr=2, docid=15659, block=2, value=0x15659, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=334, attr=0, docid=15706, block=2, value=0x15706, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=334, attr=2, docid=15706, block=2, value=0x15706, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=335, attr=0, docid=15753, block=2, value=0x15753, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=335, attr=2, docid=15753, block=2, value=0x15753, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=336, attr=0, docid=15800, block=2, value=0x15800, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=336, attr=2, docid=15800, block=2, value=0x15800, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=337, attr=0, docid=15847, block=2, value=0x15847, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=337, attr=2, docid=15847, block=2, value=0x15847, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=338, attr=0, docid=15894, block=2, value=0x15894, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=338, attr=2, docid=15894, block=2, value=0x15894, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=339, attr=0, docid=15941, block=2, value=0x15941, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=339, attr=2, docid=15941, block=2, value=0x15941, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=340, attr=0, docid=15988, block=2, value=0x15988, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=340, attr=2, docid=15988, block=2, value=0x15988, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=341, attr=0, docid=16035, block=2, value=0x16035, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=341, attr=2, docid=16035, block=2, value=0x16035, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=342, attr=0, docid=16082, block=2, value=0x16082, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=342, attr=2, docid=16082, block=2, value=0x16082, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=343, attr=0, docid=16129, block=2, value=0x16129, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=343, attr=2, docid=16129, block=2, value=0x16129, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=344, attr=0, docid=16176, block=2, value=0x16176, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=344, attr=2, docid=16176, block=2, value=0x16176, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=345, attr=0, docid=16223, block=2, value=0x16223, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=345, attr=2, docid=16223, block=2, value=0x16223, min=0x0, max=0x14053)
FAILED, unexpected attribute value (row=346, attr=0, docid=16270, block=2, value=0x16270, min=0x0, max=0x14053)
checking columnar storage...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 21207 failures reported, 0.3 sec elapsed
check FAILED, 99 of 21207 failures reported, 0.3 sec elapsed

mysql> select * from test1234;
+---------------------+---------+-----------+---------------------------------+--------------+--------------------------------------+-------------------+
| id                  | tags_id | tags_name | entities_id                     | publish_date | internal_id                          | article_body_hash |
+---------------------+---------+-----------+---------------------------------+--------------+--------------------------------------+-------------------+
| 8256091669212337100 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 830c4ad1-e5dc-4749-86af-76fc90f019ec | N/A               |
| 4529681125871381758 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 43e3817e-2353-4b39-b3ba-daf40c69e5a0 | N/A               |
| 2536545678129982796 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 6df6180c-fc54-4ff8-85f5-7e57e2795498 | N/A               |
|  585360264228615654 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 72a7ef9a-dd69-4045-9954-c509f40e6e6b | N/A               |
| 1721164721025165367 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | c15d5319-ec1d-4ea0-9d60-10469f09c0a0 | N/A               |
| 3188457150163686035 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | a3c546a8-9273-4c8b-bacb-ca025ea1ca84 | N/A               |
| 7219897691681808792 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | d3d7baf3-4d11-4e87-986c-afb0ff6d25a9 | N/A               |
| 6081570628706345627 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 1a123058-3ad5-421d-bf56-a16328f6fd9a | N/A               |
| 6506720913117422833 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 5ec0d48a-7943-42c9-b5b6-e02d97851bdd | N/A               |
| 8672735710487296954 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 4a68de8b-57d1-40bb-8ef0-f716ff8e3620 | N/A               |
| 6014046905652612197 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 1efed15a-e3a2-474b-8547-7dffee7fbf50 | N/A               |
|   55162792164680494 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 8128951d-a6b3-461d-ac02-dd02717c0627 | N/A               |
| 4720584237294856983 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | d8a6934a-5030-44e4-850e-f24509866314 | N/A               |
| 7895176696141789854 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | c8412d8d-8053-47a4-bb92-56c8238d3613 | N/A               |
| 7380040428098609807 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 18ec55c1-a8c1-466f-85e9-c628fbb2ea94 | N/A               |
| 4613647174516289942 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | 15ceb885-04ae-4246-ad7a-e33b4351597e | N/A               |
| 6616986837958039913 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | cb91b735-10bd-4b87-86ee-f9f533113bc9 | N/A               |
| 1883974879427602738 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 2f4c88cf-15d3-4e8f-94fd-b4812f07e691 | N/A               |
|  439467183578286677 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554235 | f29c1966-aa14-44ee-bf1b-6c57eb6c5928 | N/A               |
| 7546033059523140910 | [4922]  | ["qwer"]  | [208683754,203521360,206676931] |   1666554234 | 85390881-a05c-4be1-93f7-53c7c950bd5b | N/A               |
+---------------------+---------+-----------+---------------------------------+--------------+--------------------------------------+-------------------+

mysql> select * from test1234 where internal_id='123';
ERROR 2013 (HY000): Lost connection to MySQL server during query

[Sun Oct 23 19:44:27.691 2022] [17434] WARNING: last message repeated 6 times
[Sun Oct 23 19:46:52.736 2022] [17438] rt: index test123: ramchunk saved ok (mode=periodic, last TID=109, current TID=200, ram=35.378 Mb, time delta=300 sec, took=0.017 sec)
------- FATAL: CRASH DUMP -------
[Sun Oct 23 19:47:58.922 2022] [17434]
[Sun Oct 23 19:47:59.213 2022] [17433] watchdog: main process 17434 killed dirtily with signal 11, will be restarted
[Sun Oct 23 19:47:59.213 2022] [17433] watchdog: main process 18567 forked ok

Hope to helps to tackle it.

pavelnemirovsky commented 1 year ago

Aleksey N. Vinogradov

Thank you for the explanation.

tomatolog commented 1 year ago

for me your script does nothing

C:\dev\sphinx\build\crash\gh919>python3 --debug --index test123 --mc http://127.0.0.1:25512 ./manticore-bulk-generator.py

just returns and do not create table nor cluster not return any error messages

tomatolog commented 1 year ago

I reproduced the crash with this script at linux box for daemon 5.0.3 eea294f@221014 dev (columnar 1.16.1 e0697c7@220802) and need to check the recent master of daemon and columnar release version behaves the same

The cluster should be created prior to issue the script

 create cluster DMETRICS_FTS_1;

The crash stack is the following

(gdb) bt
#0  0x00007fe6dadc2438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fe6dadc403a in __GI_abort () at abort.c:89
#2  0x00007fe6dadbabe7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7fe6daa467af "m_fnProcessSubblock",
    file=file@entry=0x7fe6daa46ac0 "/home/stas/columnar/columnar/accessor/accessorint.cpp", line=line@entry=1146,
    function=function@entry=0x7fe6daa527c1 "virtual bool columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false>>::MoveToBlock(int) [VALUES = unsigned long, ACCESSOR_VALUES = unsigned long, R"...) at assert.c:92
#3  0x00007fe6dadbac92 in __GI___assert_fail (assertion=0x7fe6daa467af "m_fnProcessSubblock",
    file=0x7fe6daa46ac0 "/home/stas/columnar/columnar/accessor/accessorint.cpp", line=1146,
    function=0x7fe6daa527c1 "virtual bool columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false>>::MoveToBlock(int) [VALUES = unsigned long, ACCESSOR_VALUES = unsigned long, R"...) at assert.c:101
#4  0x00007fe6da76aeca in columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false> >::MoveToBlock
    (this=0x7fe6b40572f0, iNextBlock=0) at /home/stas/columnar/columnar/accessor/accessorint.cpp:1146
#5  0x00007fe6da6f9063 in columnar::Analyzer_T<true>::MoveToSubblock (this=0x7fe6b40572f0, iSubblock=0)
    at /home/stas/columnar/columnar/accessor/accessortraits.h:192
#6  0x00007fe6da6f836b in columnar::Analyzer_T<true>::Setup (this=0x7fe6b40572f0, pBlocks=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'

std::shared_ptr<columnar::MatchingBlocks_c> (use count 2, weak count 0) = {...}, uTotalDocs=10900)
    at /home/stas/columnar/columnar/accessor/accessortraits.h:167
#7  0x00007fe6da65f214 in columnar::Columnar_c::TryToCreateAnalyzers (this=0x7fe6d011b910, dFilters=std::vector of length 1, capacity 1 = {...},
    dDeletedFilters=std::vector of length 0, capacity 0, pMatchingBlocks=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'

std::shared_ptr<columnar::MatchingBlocks_c> (use count 2, weak count 0) = {...}) at /home/stas/columnar/columnar/columnar.cpp:705
#8  0x00007fe6da65ee07 in columnar::Columnar_c::CreateAnalyzerOrPrefilter (this=0x7fe6d011b910, dFilters=std::vector of length 1, capacity 1 = {...},
    dDeletedFilters=std::vector of length 0, capacity 0, tBlockTester=...) at /home/stas/columnar/columnar/columnar.cpp:636
#9  0x00000000005caf8e in CSphIndex_VLN::CreateColumnarAnalyzerOrPrefilter (this=0x7fe6d0083dd0, dSIInfo=..., dFilters=..., dFilterTree=...,
    pFilter=0x7fe6b400dbd0, eCollation=SPH_COLLATION_LIBC_CI, tSchema=..., sWarning=...) at /home/stas/manticore/src/sphinx.cpp:7829
#10 0x00000000005cb652 in CSphIndex_VLN::SpawnIterators (this=0x7fe6d0083dd0, tQuery=..., tCtx=..., tFlx=..., tMaxSorterSchema=..., tMeta=...,
    iCutoff=20, dModifiedFilters=...) at /home/stas/manticore/src/sphinx.cpp:7883
#11 0x00000000005cc5fc in CSphIndex_VLN::MultiScan (this=0x7fe6d0083dd0, tResult=..., tQuery=..., dSorters=..., tArgs=..., tmMaxTimer=0)
    at /home/stas/manticore/src/sphinx.cpp:8016
#12 0x00000000005d9b60 in CSphIndex_VLN::MultiQuery (this=0x7fe6d0083dd0, tResult=..., tQuery=..., dAllSorters=..., tArgs=...)
    at /home/stas/manticore/src/sphinx.cpp:10537
#13 0x00000000009dd354 in QueryDiskChunks(CSphQuery const&, CSphQueryResultMeta&, CSphMultiQueryArgs const&, RtGuard_t const&, VecTraits_T<ISphMatchSorter*>&, QueryProfile_c*, bool, CSphOrderedHash<long, CSphString, CSphStrHashFunc, 256> const*, long, char const*, SorterSchemaTransform_c&, long)::$_59::operator()() const (this=0x7fe6b400e350) at /home/stas/manticore/src/sphinxrt.cpp:6846
#14 0x00000000009dcded in std::_Function_handler<void (), QueryDiskChunks(CSphQuery const&, CSphQueryResultMeta&, CSphMultiQueryArgs const&, RtGuard_t const&, VecTraits_T<ISphMatchSorter*>&, QueryProfile_c*, bool, CSphOrderedHash<long, CSphString, CSphStrHashFunc, 256> const*, long, char const*, SorterSchemaTransform_c&, long)::$_59>::_M_invoke(std::_Any_data const&) (__functor=...)
pavelnemirovsky commented 1 year ago

for me your script does nothing

C:\dev\sphinx\build\crash\gh919>python3 --debug --index test123 --mc http://127.0.0.1:25512 ./manticore-bulk-generator.py

just returns and do not create table nor cluster not return any error messages

You couldn't pass arguments that way.

pavelnemirovsky commented 1 year ago

@tomatolog @sanikolaev do you need me to prove that nodes are crashing during a cluster recovery process (when nodes recover from donor) or it is clear to you how to reproduce it?

pavelnemirovsky commented 1 year ago

I reproduced the crash with this script at linux box for daemon 5.0.3 eea294f@221014 dev (columnar 1.16.1 e0697c7@220802) and need to check the recent master of daemon and columnar release version behaves the same

The cluster should be created prior to issue the script

 create cluster DMETRICS_FTS_1;

The crash stack is the following

(gdb) bt
#0  0x00007fe6dadc2438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fe6dadc403a in __GI_abort () at abort.c:89
#2  0x00007fe6dadbabe7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7fe6daa467af "m_fnProcessSubblock",
    file=file@entry=0x7fe6daa46ac0 "/home/stas/columnar/columnar/accessor/accessorint.cpp", line=line@entry=1146,
    function=function@entry=0x7fe6daa527c1 "virtual bool columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false>>::MoveToBlock(int) [VALUES = unsigned long, ACCESSOR_VALUES = unsigned long, R"...) at assert.c:92
#3  0x00007fe6dadbac92 in __GI___assert_fail (assertion=0x7fe6daa467af "m_fnProcessSubblock",
    file=0x7fe6daa46ac0 "/home/stas/columnar/columnar/accessor/accessorint.cpp", line=1146,
    function=0x7fe6daa527c1 "virtual bool columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false>>::MoveToBlock(int) [VALUES = unsigned long, ACCESSOR_VALUES = unsigned long, R"...) at assert.c:101
#4  0x00007fe6da76aeca in columnar::Analyzer_INT_T<unsigned long, unsigned long, columnar::ValueInInterval_T<true, true, false, false> >::MoveToBlock
    (this=0x7fe6b40572f0, iNextBlock=0) at /home/stas/columnar/columnar/accessor/accessorint.cpp:1146
#5  0x00007fe6da6f9063 in columnar::Analyzer_T<true>::MoveToSubblock (this=0x7fe6b40572f0, iSubblock=0)
    at /home/stas/columnar/columnar/accessor/accessortraits.h:192
#6  0x00007fe6da6f836b in columnar::Analyzer_T<true>::Setup (this=0x7fe6b40572f0, pBlocks=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'

std::shared_ptr<columnar::MatchingBlocks_c> (use count 2, weak count 0) = {...}, uTotalDocs=10900)
    at /home/stas/columnar/columnar/accessor/accessortraits.h:167
#7  0x00007fe6da65f214 in columnar::Columnar_c::TryToCreateAnalyzers (this=0x7fe6d011b910, dFilters=std::vector of length 1, capacity 1 = {...},
    dDeletedFilters=std::vector of length 0, capacity 0, pMatchingBlocks=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr<columnar::MatchingBlocks_c*, (__gnu_cxx::_Lock_policy)2>'

std::shared_ptr<columnar::MatchingBlocks_c> (use count 2, weak count 0) = {...}) at /home/stas/columnar/columnar/columnar.cpp:705
#8  0x00007fe6da65ee07 in columnar::Columnar_c::CreateAnalyzerOrPrefilter (this=0x7fe6d011b910, dFilters=std::vector of length 1, capacity 1 = {...},
    dDeletedFilters=std::vector of length 0, capacity 0, tBlockTester=...) at /home/stas/columnar/columnar/columnar.cpp:636
#9  0x00000000005caf8e in CSphIndex_VLN::CreateColumnarAnalyzerOrPrefilter (this=0x7fe6d0083dd0, dSIInfo=..., dFilters=..., dFilterTree=...,
    pFilter=0x7fe6b400dbd0, eCollation=SPH_COLLATION_LIBC_CI, tSchema=..., sWarning=...) at /home/stas/manticore/src/sphinx.cpp:7829
#10 0x00000000005cb652 in CSphIndex_VLN::SpawnIterators (this=0x7fe6d0083dd0, tQuery=..., tCtx=..., tFlx=..., tMaxSorterSchema=..., tMeta=...,
    iCutoff=20, dModifiedFilters=...) at /home/stas/manticore/src/sphinx.cpp:7883
#11 0x00000000005cc5fc in CSphIndex_VLN::MultiScan (this=0x7fe6d0083dd0, tResult=..., tQuery=..., dSorters=..., tArgs=..., tmMaxTimer=0)
    at /home/stas/manticore/src/sphinx.cpp:8016
#12 0x00000000005d9b60 in CSphIndex_VLN::MultiQuery (this=0x7fe6d0083dd0, tResult=..., tQuery=..., dAllSorters=..., tArgs=...)
    at /home/stas/manticore/src/sphinx.cpp:10537
#13 0x00000000009dd354 in QueryDiskChunks(CSphQuery const&, CSphQueryResultMeta&, CSphMultiQueryArgs const&, RtGuard_t const&, VecTraits_T<ISphMatchSorter*>&, QueryProfile_c*, bool, CSphOrderedHash<long, CSphString, CSphStrHashFunc, 256> const*, long, char const*, SorterSchemaTransform_c&, long)::$_59::operator()() const (this=0x7fe6b400e350) at /home/stas/manticore/src/sphinxrt.cpp:6846
#14 0x00000000009dcded in std::_Function_handler<void (), QueryDiskChunks(CSphQuery const&, CSphQueryResultMeta&, CSphMultiQueryArgs const&, RtGuard_t const&, VecTraits_T<ISphMatchSorter*>&, QueryProfile_c*, bool, CSphOrderedHash<long, CSphString, CSphStrHashFunc, 256> const*, long, char const*, SorterSchemaTransform_c&, long)::$_59>::_M_invoke(std::_Any_data const&) (__functor=...)

This issue didn't occur in 4.0.2

tomatolog commented 1 year ago

@tomatolog @sanikolaev do you need me to prove that nodes are crashing during a cluster recovery process (when nodes recover from donor) or it is clear to you how to reproduce it?

no for me it is not clear how current crash on search could cause daemon to crash on a cluster recovery process

It could be better to create another ticket where to put crash description, crash log and BT of daemon during the crash.

pavelnemirovsky commented 1 year ago

@tomatolog will do.

tomatolog commented 1 year ago

the search crash was fixed at the columnar library https://github.com/manticoresoftware/columnar/commit/2cc9cbeb02acd07b0fadfbffb5571884535c97cb

You need to update daemon and MCL packages to the dev version recent master head.

The indextool --check produced false error messages that I just fixed at master version. Index is actually valid and should pass indextool --check well