Open daikoz opened 4 months ago
Thanks @daikoz
We'll look into it.
I see the master daemon crashed on merging result sets either with or without persistent_connections_limit option
the issue was fixed at https://github.com/manticoresoftware/manticoresearch/commit/950e16a69ad7d084d21fe6d91d84984b42b5a4d2 you need to update package from the dev repository after CI finished and package will be published to get issue fixed
Reopening to complete the checklist items
we need to add to clt test the case for the query into 2 indexes with OPTION max_query_time
, like
select id from idx_small, idx_large ... OPTION max_query_time=100
should crash daemon or produces internal error message
The indexes sizes should be:
idx_small
should finish in 10 - 50 msidx_large
should finish in 1 sec - 2 secThe case is the result set from idx_small
returns but searching idx_large
exits by timeout without any result set that causes daemon to crash or produces internal error message
We need:
clt-tests\comparison-overhead-json-sql\test-comparison-overhead-json-sql.rec
but post into idx_small
1000 documents and into idx_large
all documentsselect id from idx_small where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*')
finish in 10 - 50 ms
-- select id from idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*')
finish in 1 - 2 secondsselect id from idx_small, idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=100
I wrote a Clt-test and tested different variants of OPTION max_query_time
, however, even in the case of getting warning
the daemon failure is not confirmed.
––– input –––
rm -f /var/log/manticore/searchd.log; searchd --stopwait > /dev/null; searchd; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
Manticore %{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-%{YEAR}, Manticore Software LTD (https://manticoresearch.com)
[#!/[0-9]{2}:[0-9]{2}.[0-9]{3}/!#] [#!/[0-9]+/!#] using config file '/etc/manticoresearch/manticore.conf' (%{NUMBER} chars)...
starting daemon version '%{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})' ...
listening on %{IPADDR}:9312 for sphinx and http(s)
listening on %{IPADDR}:9306 for mysql
listening on %{IPADDR}:9308 for sphinx and http(s)
Buddy started!
––– input –––
mysql -h0 -P9306 -e "SET GLOBAL qcache_max_bytes=0;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_small;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_small;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_small;"
––– output –––
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_large;"
––– output –––
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+------+
| id |
+------+
| 112 |
| 380 |
| 707 |
| 822 |
| 311 |
| 323 |
| 570 |
| 630 |
| 950 |
| 62 |
| 115 |
| 189 |
| 191 |
| 226 |
| 273 |
| 291 |
| 346 |
| 392 |
| 396 |
| 397 |
+------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+---------+------+------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------+
| warning | 1000 | query time exceeded max_query_time |
+---------+------+------------------------------------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small, idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=5;"
––– output –––
+------+
| id |
+------+
| 112 |
| 380 |
| 707 |
| 822 |
| 311 |
| 323 |
| 570 |
| 630 |
| 950 |
| 62 |
| 115 |
| 189 |
| 191 |
| 226 |
| 273 |
| 291 |
| 346 |
| 392 |
| 396 |
| 397 |
+------+
Hi, 6.3.4 is out today. I don't see this fix. Is this an oversight in the changelog ? thx
6.3.4 is service release and does not include all fixes from the master. I sure this fix along with all other fixes will be in the next major release.
You could use package from the dev repository to get this crash fixed.
The fix has been included in 6.3.6.
After test with 6.3.6, I always the issue when I add:
persistent_connections_limit = 32
in searchd section
@daikoz I can't reproduce the issue in 6.3.7:
snikolaev@dev2:~/issue-2352$ while IFS= read -r ligne; do mysql -h0 -P29306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.070) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.019) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
...
and so on and no error like:
ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema
My config is:
mysql> desc AdvertisementsLB;
+--------------------------------+-------------------+
| Agent | Type |
+--------------------------------+-------------------+
| localhost:29312:Advertisements | remote_1_mirror_1 |
| localhost:39312:Advertisements | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)
mysql> desc SearchKeywordsLB;
+--------------------------------+-------------------+
| Agent | Type |
+--------------------------------+-------------------+
| localhost:29312:SearchKeywords | remote_1_mirror_1 |
| localhost:39312:SearchKeywords | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)
mysql> show tables;
+---------------------+-------------+
| Index | Type |
+---------------------+-------------+
| Advertisements | distributed |
| AdvertisementsDelta | local |
| AdvertisementsLB | distributed |
| AdvertisementsMain | local |
| SearchKeywords | local |
| SearchKeywordsLB | distributed |
+---------------------+-------------+
6 rows in set (0.00 sec)
on each node.
The mentioned setting is uncommented on both nodes:
snikolaev@dev2:~/issue-2352$ grep pers 1/manticore.conf 2/manticore.conf
1/manticore.conf: persistent_connections_limit=10
1/manticore.conf: agent_persistent = localhost:29312|localhost:39312:Advertisements
1/manticore.conf: agent_persistent = localhost:29312|localhost:39312:SearchKeywords
2/manticore.conf: persistent_connections_limit=10
2/manticore.conf: agent_persistent = localhost:29312|localhost:39312:Advertisements
2/manticore.conf: agent_persistent = localhost:29312|localhost:39312:SearchKeywords
The Manticore version is:
snikolaev@dev2:~/issue-2352$ searchd -v
Manticore 6.3.7 0ebcf8c82@24080614 dev (columnar 2.3.1 eb62283@24072718) (secondary 2.3.1 eb62283@24072718) (knn 2.3.1 eb62283@24072718)
Please check if the issue persists in this (or newer) version for you (https://mnt.cr/dev/nightly) and if it does - provide an instruction how to reproduce it.
I tested on 6.3.7. I can reproduce this issue.
# searchd -v
Manticore 6.3.7 66626616f@24082118 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
In the scenario, I launch the 2 command while on same server.
We can do a zoom, I will share my screen.
Same issue on production when I try to activate persistent_connections_limit. Without persistent_connections_limit, no issue now.
could you create docker-compose file with indexes and configs we could run locally and catch this issue?
Bug Description:
I upload data and configuration on s3 (like https://manual.manticoresearch.com/Reporting_bugs#How-do-I-install-debug-symbols?)
In summary, I have 2 servers: SERVER1 SERVER2 with same manticore configuration.
I use a distributed index like:
Scenario: 1/ Configure SERVER1 SERVER2 and modify the ip of agent_persistent 2/ Launch manticore on 2 servers:
searchd --console --config manticore.conf
manticore.conf is provided on s3
3/ on server1, Launch 2 times:
while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"
=> All is OK; No errors
4/ Now, uncomment persistent_connections_limit in manticore.conf, restart manticore on 2 servers and launch 2 times on server1:
while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"
=> ERRORS:
ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema
Manticore Search Version:
Manticore 6.3.2 c296dc7c8@24062606
Operating System Version:
Debian 12 bookworm uptodate
Have you tried the latest development version?
No
Internal Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.