Open Grabien opened 1 year ago
could you check your data with the following commands
indextool --check transcripts
indextool --check transcripts_delta
and provide the commands result?
This command displays some errors for the "transcripts" index:
indextool --check transcripts
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/etc/manticoresearch/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, wrong word-delta (pos=1, word=, len=0, begin=193, delta=112)
FAILED, empty word in dictionary (pos=1)
FAILED, wrong word-delta (pos=124, word=, len=0, begin=9, delta=1)
FAILED, empty word in dictionary (pos=124)
FAILED, wrong word-delta (pos=132, word=, len=0, begin=10, delta=1)
FAILED, empty word in dictionary (pos=132)
FAILED, wrong word-delta (pos=140, word=, len=0, begin=11, delta=1)
FAILED, empty word in dictionary (pos=140)
FAILED, wrong word-delta (pos=151, word=, len=0, begin=12, delta=2)
FAILED, empty word in dictionary (pos=151)
FAILED, wrong word-delta (pos=160, word=, len=0, begin=14, delta=2)
FAILED, empty word in dictionary (pos=160)
FAILED, wrong word-delta (pos=169, word=, len=0, begin=16, delta=1)
FAILED, empty word in dictionary (pos=169)
FAILED, wrong word-delta (pos=178, word=, len=0, begin=17, delta=1)
FAILED, empty word in dictionary (pos=178)
FAILED, wrong word-delta (pos=187, word=, len=0, begin=18, delta=3)
FAILED, empty word in dictionary (pos=187)
FAILED, wrong word-delta (pos=198, word=, len=0, begin=21, delta=3)
FAILED, empty word in dictionary (pos=198)
FAILED, wrong word-delta (pos=209, word=, len=0, begin=24, delta=2)
FAILED, empty word in dictionary (pos=209)
FAILED, wrong word-delta (pos=219, word=, len=0, begin=26, delta=6)
FAILED, empty word in dictionary (pos=219)
FAILED, wrong word-delta (pos=233, word=, len=0, begin=28, delta=4)
FAILED, empty word in dictionary (pos=233)
FAILED, wrong word-delta (pos=245, word=, len=0, begin=21, delta=1)
FAILED, empty word in dictionary (pos=245)
FAILED, wrong word-delta (pos=254, word=, len=0, begin=20, delta=14)
FAILED, empty word in dictionary (pos=254)
FAILED, wrong word-delta (pos=276, word=, len=0, begin=19, delta=1)
FAILED, empty word in dictionary (pos=276)
FAILED, wrong word-delta (pos=285, word=, len=0, begin=16, delta=17)
FAILED, empty word in dictionary (pos=285)
FAILED, wrong word-delta (pos=310, word=, len=0, begin=15, delta=1)
FAILED, empty word in dictionary (pos=310)
FAILED, wrong word-delta (pos=318, word=, len=0, begin=15, delta=4)
FAILED, empty word in dictionary (pos=318)
FAILED, wrong word-delta (pos=329, word=, len=0, begin=14, delta=2)
FAILED, empty word in dictionary (pos=329)
FAILED, wrong word-delta (pos=338, word=, len=0, begin=13, delta=3)
FAILED, empty word in dictionary (pos=338)
FAILED, wrong word-delta (pos=348, word=, len=0, begin=12, delta=6)
FAILED, empty word in dictionary (pos=348)
FAILED, wrong word-delta (pos=361, word=, len=0, begin=12, delta=1)
FAILED, empty word in dictionary (pos=361)
FAILED, wrong word-delta (pos=369, word=, len=0, begin=12, delta=6)
FAILED, empty word in dictionary (pos=369)
FAILED, wrong word-delta (pos=382, word=, len=0, begin=11, delta=4)
FAILED, empty word in dictionary (pos=382)
FAILED, wrong word-delta (pos=393, word=, len=0, begin=15, delta=4)
FAILED, empty word in dictionary (pos=393)
FAILED, wrong word-delta (pos=404, word=, len=0, begin=11, delta=9)
FAILED, empty word in dictionary (pos=404)
FAILED, wrong word-delta (pos=421, word=, len=0, begin=11, delta=13)
FAILED, empty word in dictionary (pos=421)
FAILED, wrong word-delta (pos=442, word=, len=0, begin=11, delta=17)
FAILED, empty word in dictionary (pos=442)
FAILED, wrong word-delta (pos=467, word=, len=0, begin=10, delta=1)
FAILED, empty word in dictionary (pos=467)
FAILED, wrong word-delta (pos=475, word=, len=0, begin=10, delta=18)
FAILED, empty word in dictionary (pos=475)
FAILED, wrong word-delta (pos=501, word=, len=0, begin=10, delta=12)
FAILED, empty word in dictionary (pos=501)
FAILED, wrong word-delta (pos=521, word=, len=0, begin=10, delta=12)
FAILED, empty word in dictionary (pos=521)
FAILED, wrong word-delta (pos=541, word=, len=0, begin=10, delta=8)
FAILED, empty word in dictionary (pos=541)
FAILED, wrong word-delta (pos=556, word=, len=0, begin=10, delta=5)
FAILED, empty word in dictionary (pos=556)
FAILED, wrong word-delta (pos=568, word=, len=0, begin=9, delta=1)
FAILED, empty word in dictionary (pos=568)
FAILED, wrong word-delta (pos=576, word=, len=0, begin=10, delta=2)
FAILED, empty word in dictionary (pos=576)
FAILED, wrong word-delta (pos=585, word=, len=0, begin=8, delta=1)
FAILED, empty word in dictionary (pos=585)
FAILED, wrong word-delta (pos=593, word=, len=0, begin=8, delta=4)
FAILED, empty word in dictionary (pos=593)
FAILED, wrong word-delta (pos=604, word=, len=0, begin=8, delta=2)
FAILED, empty word in dictionary (pos=604)
FAILED, wrong word-delta (pos=613, word=, len=0, begin=8, delta=1)
FAILED, empty word in dictionary (pos=613)
FAILED, wrong word-delta (pos=621, word=, len=0, begin=8, delta=8)
FAILED, empty word in dictionary (pos=621)
FAILED, wrong word-delta (pos=636, word=, len=0, begin=8, delta=3)
FAILED, empty word in dictionary (pos=636)
FAILED, wrong word-delta (pos=646, word=, len=0, begin=10, delta=4)
FAILED, empty word in dictionary (pos=646)
FAILED, wrong word-delta (pos=657, word=, len=0, begin=8, delta=4)
FAILED, empty word in dictionary (pos=657)
FAILED, wrong word-delta (pos=668, word=, len=0, begin=12, delta=16)
FAILED, empty word in dictionary (pos=668)
FAILED, wrong word-delta (pos=692, word=, len=0, begin=8, delta=8)
FAILED, empty word in dictionary (pos=692)
FAILED, wrong word-delta (pos=707, word=, len=0, begin=10, delta=6)
FAILED, empty word in dictionary (pos=707)
FAILED, wrong word-delta (pos=720, word=, len=0, begin=11, delta=5)
FAILED, empty word in dictionary (pos=720)
FAILED, wrong word-delta (pos=732, word=, len=0, begin=12, delta=4)
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 99 of 166363 failures reported, 716.0 sec elapsed
indextool --check transcripts_delta
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/etc/manticoresearch/manticore.conf'...
checking table 'transcripts_delta'...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check passed, 0.9 sec elapsed
seems you mains index is invalid and should be reinfected from scratch
Yes, after completely rebuilding the "transcripts" index, the merging with the delta index works well for 3-5 days. But then the same issue happens again: the index is broken, merging is crashing, and I have to start over.
then you need change pipeline to backup indexes before the merge operation then issues merge as
sudo -u manticore indexer --rotate --nohup --merge main delta
indextool --rotate --check main
this way indexer will merge data but will not send signal to daemon then indextool check the main index and sends signal to daemon if index is valid
this way after main got invalid you could provide main and delta indexes you backed up for investigation of how merge creates bad index
I'm sorry, I don't quite understand how my merging script should look like. I changed it to the following:
/usr/bin/indexer --merge transcripts transcripts_delta --rotate --nohup
/usr/bin/indextool --rotate --check transcripts
But when I run it, I receive these errors:
WARNING: Index header format is not json, will try it as binary...
WARNING: Unable to load header... Error failed to open /var/lib/manticore/transcripts.tmp.sph: No such file or directory
FATAL: table 'transcripts': prealloc failed: failed to open /var/lib/manticore/transcripts.tmp.sph: No such file or directory
there is an example of --nohup
cli at out manual indexer cli section and it shows the same command sequence.
Need to check the case by my own. I was sure it will work the way as described in manual.
Yesterday, merging failed again. Below is the link to the full set of indexes: old main index, delta index, and temp files of an incomplete merge. I hope it will be useful to find the reason for this crashing. Please let me know if I can provide any other additional information. This bug is very annoying.
Thanks. I've started downloading the archive on our dev server.
@PavelShilin89 pls try to reproduce the issue on dev2
. Once downloaded (in a couple of hours), the archive will be at /home/snikolaev/indexes.zip
.
Below is the link to the full set of indexes: old main index, delta index, and temp files of an incomplete merge
Can we have your config too please?
This is our configuration file for these indexes:
searchd
{
listen = 9312:sphinx
listen = 9306:mysql41
listen = 9308:http
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /var/run/manticore/searchd.pid
query_log_format = sphinxql
network_timeout = 30
}
indexer
{
max_file_field_buffer = 16M
mem_limit = 1024M
}
source database
{
type = mysql
sql_host = ...
sql_user = ...
sql_pass = ...
sql_db = ...
sql_query_pre = set names utf8
sql_query_pre = set character set utf8
sql_query_pre = set session long_query_time = 600
sql_query_pre = set session wait_timeout = 600
}
source transcripts : database
{
sql_query_pre = update sm_sphinxcounters set lastid = (select max(id) from sm_transcripts) where indexname = 'transcripts'
sql_query_range = select min(id), max(id) from sm_transcripts
sql_range_step = 5000
sql_file_field = filename
sql_query = select id, id as docid, title, concat('/mnt/mirror/media/transcripts/', lpad(floor(id / 1000), 4, '0'), '/', id, '.', format) as filename from sm_transcripts where id >= $start and id <= $end and status = 'Active'
sql_attr_uint = docid
}
source transcripts_delta : database
{
sql_file_field = filename
sql_query = select id, id as docid, title, concat('/mnt/mirror/media/transcripts/', lpad(floor(id / 1000), 4, '0'), '/', id, '.', format) as filename from sm_transcripts where id > (select lastid from sm_sphinxcounters where indexname = 'transcripts') and status = 'Active'
sql_attr_uint = docid
}
index transcripts
{
source = transcripts
path = /var/lib/manticore/transcripts
}
index transcripts_delta
{
source = transcripts_delta
path = /var/lib/manticore/transcripts_delta
}
The delta table is ok:
snikolaev@dev2:~/115GB$ indextool -c manticore.conf --check transcripts_delta
Manticore 6.2.13 01c4e054a@231103 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/snikolaev/115GB/manticore.conf'...
checking table 'transcripts_delta'...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check passed, 2.2 sec elapsed
, but the larger one is corrupted:
snikolaev@dev2:~/115GB$ indextool -c manticore.conf --check transcripts
Manticore 6.2.13 01c4e054a@231103 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/snikolaev/115GB/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, invalid docs/hits (pos=12, word=00, docs=1756020, hits=-2134759148)
checking data...
FAILED, rowid out of bounds (wordid=0(0), rowid=6580323)
FAILED, hit entries sorting order decreased (wordid=0(0), rowid=0, hit=16777592, last=16780657)
FAILED, hit decreased (wordid=0(0), rowid=0, hit=376, last=3441)
FAILED, rowid out of bounds (wordid=0(0), rowid=16843009)
FAILED, hit entries sorting order decreased (wordid=0(0), rowid=16843009, hit=16778765, last=16781723)
FAILED, hit decreased (wordid=0(0), rowid=16843009, hit=1549, last=4507)
FAILED, hit entries sorting order decreased (wordid=0(0), rowid=16843009, hit=16784701, last=16786074)
FAILED, hit decreased (wordid=0(0), rowid=16843009, hit=7485, last=8858)
This is a likely reason why indexer --merge
failed. Can you please:
indextool --check
after mergingindextool --check
or indexer --merge
fails, provide the tables again?
Okay, I will do it manually every day. If the merging fails, I will send you all the indexes again.
I recreated the main index from scratch and ran indextool afterwards. I immediately see one failed item in the output. Does this mean that the index is already broken?
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/etc/manticoresearch/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, invalid docs/hits (pos=20, word=00, docs=1769514, hits=-2126401355)
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 1 failures reported, 820.2 sec elapsed
yes seems main got broken from the indexing
Could you provide your source data along with config to reproduce issue here locally?
Could you please send me an e-mail to max@grabien.com? I will send you the links to our data.
you could mail these into dev@manticoresearch.com or you could upload the data as described at our manual https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data
@PavelShilin89 pls find @Grabien's email sent to dev@manticoresearch.com and prepare an MRE.
I have run the indexer on both dev version and release version 6.2.12, crash does not reproduce. On dev version 6.2.13 there are no errors, no crash, only warnings. On release version 6.2.12, no crash, but an error occurs. I also increased the timeouts time, for correct indexing.
Here is my configuration file:
searchd
{
listen = 59312:sphinx
listen = 59306:mysql41
listen = 59308:http
log = /home/pavel/issue-1578/manticore/searchd.log
query_log = /home/pavel/issue-1578/manticore/query.log
pid_file = /home/pavel/issue-1578/manticore/searchd.pid
query_log_format = sphinxql
network_timeout = 600
}
indexer
{
max_file_field_buffer = 16M
mem_limit = 1024M
}
source database
{
type = mysql
sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_query_pre = set names utf8
sql_query_pre = set character set utf8
sql_query_pre = set session long_query_time = 3000
sql_query_pre = set session wait_timeout = 3000
}
source transcripts : database
{
sql_query_pre = update sm_sphinxcounters set lastid = (select max(id) from sm_transcripts) where indexname = 'transcripts'
sql_query_range = select min(id), max(id) from sm_transcripts
sql_range_step = 5000
sql_file_field = filename
sql_query = select id, id as docid, title, concat('/home/pavel/issue-1578/transcripts/', lpad(floor(id / 1000), 4, '0'), '/', id, '.', format) as filename from sm_transcripts where id >= $start and id <= $end and status = 'Active'
sql_attr_uint = docid
}
source transcripts_delta : database
{
sql_file_field = filename
sql_query = select id, id as docid, title, concat('/home/pavel/issue-1578/transcripts/', lpad(floor(id / 1000), 4, '0'), '/', id, '.', format) as filename from sm_transcripts where id > (select lastid from sm_sphinxcounters where indexname = 'transcripts') and status = 'Active'
sql_attr_uint = docid
}
index transcripts
{
source = transcripts
path = /home/pavel/issue-1578/transcripts/transcripts
}
index transcripts_delta
{
source = transcripts_delta
path = /home/pavel/issue-1578/transcripts/transcripts_delta
}
pavel@dev2:~/issue-1578$ ./indexer -c manticore.conf --all
Manticore 6.2.12 dc5144d35@230822
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
WARNING: Error initializing columnar storage: daemon requires columnar library v21 (trying to load v24)
WARNING: Error initializing secondary index: daemon requires secondary library v10 (trying to load v13)
using config file '/home/pavel/issue-1578/manticore.conf'...
indexing table 'transcripts'...
ERROR: table 'transcripts': sql_fetch_row: Lost connection to MySQL server during query.
total 442699 docs, 41599553655 bytes
total 7692.191 sec, 5408023 bytes/sec, 57.55 docs/sec
indexing table 'transcripts_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 4.738 sec, 0 bytes/sec, 0.00 docs/sec
total 442699 reads, 5505.621 sec, 91.7 kb/call avg, 12.4 msec/call avg
total 126230 writes, 40.445 sec, 333.9 kb/call avg, 0.3 msec/call avg
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728613.srt: No such file or directory
Usage of /: 48.3% of 7.12TB
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728617.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728619.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728621.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728623.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728625.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728627.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728629.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728631.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728633.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728635.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728637.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728639.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728641.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728643.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728645.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728647.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728649.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728651.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728653.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728655.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728657.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728659.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728661.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728663.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728665.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728667.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728669.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728671.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728673.srt: No such file or directory
WARNING: failed to open /home/pavel/issue-1578/transcripts/3728/3728675.srt: No such file or directory
collected 1774552 docs, 96019.7 MB
creating secondary index
creating lookup: 1774.5 Kdocs, 100.0% done
sorted 20280.6 Mhits, 100.0% done
WARNING: table 'transcripts': failed to open /home/pavel/issue-1578/transcripts/3728/3728675.srt: No such file or directory.
total 1774552 docs, 96019756610 bytes
total 24149.946 sec, 3975982 bytes/sec, 73.48 docs/sec
indexing table 'transcripts_delta'...
collected 0 docs, 0.0 MB
creating secondary index
total 0 docs, 0 bytes
total 0.532 sec, 0 bytes/sec, 0.00 docs/sec
total 1803818 reads, 15395.620 sec, 77.4 kb/call avg, 8.5 msec/call avg
total 347138 writes, 183.746 sec, 437.2 kb/call avg, 0.5 msec/call avg
I tried to increase timeouts as in the config provided above, but unfortunately, there were no changes. The same crash in 2-3 days.
@PavelShilin89 looks like you reproduced the issue, but didn't notice it, because you didn't run indextool --check
:
snikolaev@dev2:/home/pavel/issue-1578$ indextool -c manticore.conf --check transcripts
Manticore 6.2.13 e80d505b9@240103 dev (columnar 2.2.5 1d1e432@231204) (secondary 2.2.5 1d1e432@231204)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, invalid docs/hits (pos=20, word=00, docs=1773341, hits=-2123101169)
checking data...
Please try to localize it now.
@sanikolaev The problem is really only reproduced on full data volume, when the data volume is reduced everything is correct. Also, after starting the indexer you need to check indextool -c manticore.conf --check transcripts
.
Logs:
pavel@dev2:~/issue-1578$ indextool -c manticore.conf --check transcripts
Manticore 6.2.13 978d5656c@24012517 dev (columnar 2.2.5 214ce90@240115) (secondary 2.2.5 214ce90@240115)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, invalid docs/hits (pos=20, word=00, docs=1760111, hits=-2131936544)
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 1 failures reported, 2358.5 sec elapsed
I have verified that the problem is reproducible, but only with large amounts of data. When checking a certain part of the data or reducing the volume, everything works correctly.
Steps to reproduce:
ssh {yourname}@dev2.manticoresearch.com
cd /home/pavel/issue-1578
screen -x {name}
indexer -c manticore.conf --all
indextool -c manticore.conf --check transcripts
Logs:
pavel@dev2:~/issue-1578$ indextool -c manticore.conf --check transcripts
Manticore 6.2.13 978d5656c@24012517 dev (columnar 2.2.5 214ce90@240115) (secondary 2.2.5 214ce90@240115)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
FAILED, invalid docs/hits (pos=20, word=00, docs=1760111, hits=-2131936544)
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 1 failures reported, 2358.5 sec elapsed
It seems that it is not related to the amount of data. Lately, the same issue is happening with another index, which is almost 40 times smaller. If it would simplify testing, I can also send all the files of this index.
@Grabien it would be very helpful. Please do
@sanikolaev I have just sent all the information to your email.
@Grabien, I am unable to reproduce the issue with the new data files/tables. Running indextool -c manti.conf --check transcripts
does not reveal any corruption in the tables, neither before nor after merging them. Could you provide more detailed instructions on how to reproduce the issue using the new files?
@sanikolaev It's strange, but I was also not able to reproduce the issue with this data anymore. I will keep testing and let you know once I have any new information.
Tried several times on with different builds and hardware:
6.2.12 (release, fresh build) 6.2.12 (release, copied file which was reported as reproducable) rev ddd7c3ed (master)
on x86_64 on M2 (arm64)
no corruption revealed; nothing to fix. Used test data from https://github.com/manticoresoftware/manticoresearch/issues/1578#issuecomment-1926477102
tried current master with following results:
pavel@dev2:~/issue-1578$ time indexer -c manticore.conf transcripts
Manticore 6.2.13 7ecf541ab@24041615 dev (columnar 2.2.5 b4f7386@240405) (secondary 2.2.5 b4f7386@240405) (knn 2.2.5 b4f7386@240405)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
indexing table 'transcripts'...
collected 1409687 docs, 85066.8 MB
creating secondary index
creating lookup: 1409.6 Kdocs, 100.0% done
sorted 17942.5 Mhits, 100.0% done
total 1409687 docs, 85066857598 bytes
total 21945.242 sec, 3876323 bytes/sec, 64.23 docs/sec
total 1433399 reads, 13545.208 sec, 86.0 kb/call avg, 9.4 msec/call avg
total 91279 writes, 217.079 sec, 1463.8 kb/call avg, 2.3 msec/call avg
real 365m45.647s
user 132m1.250s
sys 3m49.842s
pavel@dev2:~/issue-1578$ indextool --check transcripts
Manticore 6.2.13 7ecf541ab@24041615 dev (columnar 2.2.5 b4f7386@240405) (secondary 2.2.5 b4f7386@240405)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check passed, 863.4 sec elapsed
problem still not reproduced. So, MRE looks not actual
BTW, for real run it is necessary to run 'su pavel', else reported MRE fails on indexing because can't access to .spl file.
problem still not reproduced. So, MRE looks not actual
@PavelShilin89 pls prepare a better MRE or confirm the problem is solved.
I've run one more time (still in progress) to be twice sure. On the origin, on dev2.
Comparing 2 reports I see that original check run >2000s, but my lates on the same hardware took <900s. Maybe it means, system was busy, and overall business someway affects the result, but this is just a guess.
last control check - indexing done (take ~6 hours), checking done, no problems revealed
pavel@dev2:~/issue-1578$ indextool -c manticore.conf --check transcripts
Manticore 6.2.13 7ecf541ab@24041615 dev (columnar 2.2.5 b4f7386@240405) (secondary 2.2.5 b4f7386@240405)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
using config file '/home/pavel/issue-1578/manticore.conf'...
checking table 'transcripts'...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check passed, 918.1 sec elapsed
One extra thing - hashes of index files (m.b. faster to compare, then run indextool)
a172ffca50a20b20c4a76364913403a8 *transcripts.spa
1cd74f69280aa5aa7ef6476f62e88402 *transcripts.spd
80dd94cae7da4c1efbc10a549a3a52f1 *transcripts.spds
cc4ea1666238c44060f5f1f1452e4d53 *transcripts.spe
39d268632ac3fa39cc32c628e68978f5 *transcripts.sphi
609b897389df7708d58d63649646b2c6 *transcripts.spi
a31d1ae0a56473f262ebe1b5ef4e0bb6 *transcripts.spidx
b5a3c96a4ce2d9a9bba4945490730f9a *transcripts.spm
acd9f1dd531d59ae9ae77011881e03c7 *transcripts.spp
04845ecb3a17d4af6694c0116096dad7 *transcripts.spt
(sph is excluded, since it has timestamp inside, so hash will be different each attempt). These hashes persist all when all kind of indexing (on dev, on another intel host, on mac M2).
@sanikolaev I tried for a long time to reproduce the process on dev2, in order to get an error, but never reproduced it.
I compared the hashes of the index files many times, and they always matched too.
As an experiment, I tried changing the step range
to 50, 150, 200, 250, 300, this also did not give the error. I have no more ideas how one can deliberately affect the process and get an error.
changing the step range to 50, 150, 200, 250, 300, this also did not give the error. I have no more ideas
You originally reproduced it with step 5000 here https://github.com/manticoresoftware/manticoresearch/issues/1578#issuecomment-1869033529 (rel. comment https://github.com/manticoresoftware/manticoresearch/issues/1578#issuecomment-1882264575)
Try to replicate exactly what you did back then.
@sanikolaev I have tested all ways to get this error, but have never been able to reproduce it. Maybe there are other ideas how to reproduce this error?
Describe the bug I have two indexes: main, which contains data from a large number of transcripts, and delta, which only contains fresh data from the last 24 hours. A cron script merges the data from the delta index into the main index every night. Sometimes the merging process stops working because the indexer crashes during it. To fix this issue, I have to completely rebuild the main index. However, after 3-5 days, the crash occurs again.
To Reproduce Steps to reproduce the behavior:
/usr/bin/indexer --merge transcripts transcripts_delta --rotate
Describe the environment:
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Messages from log files:
UPDATE 2024 Feb 13
MRE is here https://github.com/manticoresoftware/manticoresearch/issues/1578#issuecomment-1926477102