Open alexiv1965 opened 2 weeks ago
The related message in TG - https://t.me/manticore_chat/7200/16974
Thereafter it'll produce very unclear read error.
Please provide more details about it. What was the error in your case?
That is error in log: "global IDF unavailable - IGNORING". It concerns only my case: I've tried to set up very large global.idf file for the index, but this file cannot be read. In other situations there'll be other error messages.
Can you share this idf file by uploading it to our write-only S3 storage? https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data
Done to manticore/write-only/issue-2739/global_su.idf.gz Please, gzip -d it before use.
Thanks for providing the file.
Unfortunately, I can't reproduce global IDF unavailable - IGNORING
with this file:
snikolaev@dev2:~$ ls -la global_su.idf
-rw-r----- 1 snikolaev snikolaev 377581400 Nov 14 03:04 global_su.idf
snikolaev@dev2:~$ md5sum global_su.idf
39bb7f06e4b0fffa412f561f207dd56e global_su.idf
, this config:
snikolaev@dev2:~$ cat min_global_idf.conf
source src {
type = csvpipe
csvpipe_command = echo "1,ab"
csvpipe_field = f
}
index idx {
path = idx
source = src
global_idf = global_su.idf
}
searchd {
listen = 9315:mysql
log = searchd.log
pid_file = 9315.pid
binlog_path =
}
and this script:
snikolaev@dev2:~$ rm searchd.log
snikolaev@dev2:~$ searchd -c min_global_idf.conf
Manticore 6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[10:01.300] [1358347] using config file '/home/snikolaev/min_global_idf.conf' (270 chars)...
starting daemon version '6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)' ...
listening on all interfaces for mysql, port=9315
precaching table 'idx'
precached 1 tables in 0.228 sec
snikolaev@dev2:~$ mysql -P9315 -h0 -e "select * from idx where match('ab')"
+------+------+
| id | f |
+------+------+
| 1 | ab |
+------+------+
snikolaev@dev2:~$ cat searchd.log
[Thu Nov 14 03:10:01.303 2024] [1358349] watchdog: main process 1358350 forked ok
[Thu Nov 14 03:10:01.304 2024] [1358350] Using local time zone '/etc/localtime'
[Thu Nov 14 03:10:01.306 2024] [1358350] starting daemon version '6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)' ...
[Thu Nov 14 03:10:01.306 2024] [1358350] listening on all interfaces for mysql, port=9315
[Thu Nov 14 03:10:01.581 2024] [1358355] prereading 1 tables
[Thu Nov 14 03:10:01.581 2024] [1358350] WARNING: [BUDDY] no SPHINX or HTTP listeners found, disabled
[Thu Nov 14 03:10:01.582 2024] [1358350] accepting connections
[Thu Nov 14 03:10:01.582 2024] [1358355] preread 1 tables in 0.001 sec
If I specify a file that doesn't exist, I can reproduce the warning without issues.
snikolaev@dev2:~$ searchd -c min_global_idf.conf
Manticore 6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[13:28.967] [1358711] using config file '/home/snikolaev/min_global_idf.conf' (271 chars)...
starting daemon version '6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)' ...
listening on all interfaces for mysql, port=9315
precaching table 'idx'
WARNING: table 'idx': global IDF unavailable - IGNORING
precached 1 tables in 0.001 sec
Could you please provide more details so we can reproduce the issue?
Oh, sorry, my mistake. That was "small" file, it's size is less 2GB. Now I've uploaded "large" one for problem reproduction: manticore/write-only/issue-2739/global_large.idf.gz
Actually I've run into the trouble with RT-table (scenario as in issue 1111), but I think, in this case it is meaningless.
Thanks. I confirm the warning:
snikolaev@dev2:~$ searchd -c min_global_idf.conf
Manticore 6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[02:58.348] [1420987] using config file '/home/snikolaev/min_global_idf.conf' (273 chars)...
starting daemon version '6.3.7 e2c80bb93@24111308 dev (columnar 2.3.1 30ad2d6@24100914) (secondary 2.3.1 30ad2d6@24100914) (knn 2.3.1 30ad2d6@24100914)' ...
listening on all interfaces for mysql, port=9315
precaching table 'idx'
WARNING: table 'idx': global IDF unavailable - IGNORING
precached 1 tables in 0.001 sec
@alexiv1965 Since you've identified the issue, would you like to create a PR to fix it?
I can propose the solution in PR (it is rather simple: just throw away zero value for global variable), but I cannot write unit tests for it, since this solution will affect almost all read-write operations in manticore. That's why I think that this solution claims an attention of core team.
We'll at least be able to see if any tests fail, so please go ahead with a PR.
Proposal:
Almost all read actions in searchd go through sphReadThrottled() (with write actions - all the same). Let's note that Linux read() function (called later) cannot read one piece that is larger 2GB.
Here is selection of "step size" for single read action. The rt_merge_maxiosize config option if set - will limit the size of every read/write actions by its value. But this option is not set by default and very few admins will take it into account. So, most often scenario is zero value for g_iMaxIOSize global variable, and hence - the whole size will be read in one chunk. Thereafter it'll produce very unclear read error.
I suppose to set strict upper limit for g_iMaxIOSize value (not zero) - to make sure that Linux read() can do its job without error - this value should be a little bit less than 2GB - see read() man for details.
By the way: the following call of sphRead() have very rough fix for Windows warning: cast to int of read chunk size. Without described above solution it leads to more unclear read errors: large positive value is casted to negative value and then in call to read() it is casted to very-very large positive value. Suppose this should be also fixed.
Once again: the same option and global variable rules both read and write actions.
Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.