manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.05k stars 509 forks source link

Activate persistent_connections_limit generate query error #2352

Open daikoz opened 4 months ago

daikoz commented 4 months ago

Bug Description:

I upload data and configuration on s3 (like https://manual.manticoresearch.com/Reporting_bugs#How-do-I-install-debug-symbols?)

In summary, I have 2 servers: SERVER1 SERVER2 with same manticore configuration.

I use a distributed index like:

index XXXX
{
     type             = distributed
     agent_persistent = 192.168.2.12:9312|192.168.2.13:9312:Advertisements
     agent_retry_count = 1
     ha_strategy      = nodeads
}

Scenario: 1/ Configure SERVER1 SERVER2 and modify the ip of agent_persistent 2/ Launch manticore on 2 servers:

searchd --console --config manticore.conf

manticore.conf is provided on s3

3/ on server1, Launch 2 times:

while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"

=> All is OK; No errors

4/ Now, uncomment persistent_connections_limit in manticore.conf, restart manticore on 2 servers and launch 2 times on server1:

while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"

=> ERRORS:

ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema

Manticore Search Version:

Manticore 6.3.2 c296dc7c8@24062606

Operating System Version:

Debian 12 bookworm uptodate

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [x] Implementation completed - [x] Tests developed - [x] Documentation updated - [x] Documentation reviewed - [x] Changelog updated
sanikolaev commented 4 months ago

Thanks @daikoz

We'll look into it.

tomatolog commented 4 months ago

I see the master daemon crashed on merging result sets either with or without persistent_connections_limit option

tomatolog commented 4 months ago

the issue was fixed at https://github.com/manticoresoftware/manticoresearch/commit/950e16a69ad7d084d21fe6d91d84984b42b5a4d2 you need to update package from the dev repository after CI finished and package will be published to get issue fixed

sanikolaev commented 4 months ago

Reopening to complete the checklist items

tomatolog commented 4 months ago

we need to add to clt test the case for the query into 2 indexes with OPTION max_query_time, like

select id from idx_small, idx_large ... OPTION max_query_time=100

should crash daemon or produces internal error message

The indexes sizes should be:

The case is the result set from idx_small returns but searching idx_large exits by timeout without any result set that causes daemon to crash or produces internal error message

We need:

PavelShilin89 commented 4 months ago

I wrote a Clt-test and tested different variants of OPTION max_query_time, however, even in the case of getting warning the daemon failure is not confirmed.

––– input –––
rm -f /var/log/manticore/searchd.log; searchd --stopwait > /dev/null; searchd; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
Manticore %{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-%{YEAR}, Manticore Software LTD (https://manticoresearch.com)
[#!/[0-9]{2}:[0-9]{2}.[0-9]{3}/!#] [#!/[0-9]+/!#] using config file '/etc/manticoresearch/manticore.conf' (%{NUMBER} chars)...
starting daemon version '%{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})' ...
listening on %{IPADDR}:9312 for sphinx and http(s)
listening on %{IPADDR}:9306 for mysql
listening on %{IPADDR}:9308 for sphinx and http(s)
Buddy started!
––– input –––
mysql -h0 -P9306 -e "SET GLOBAL qcache_max_bytes=0;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_small;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_small;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_small;"
––– output –––
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_large;"
––– output –––
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+------+
| id   |
+------+
|  112 |
|  380 |
|  707 |
|  822 |
|  311 |
|  323 |
|  570 |
|  630 |
|  950 |
|   62 |
|  115 |
|  189 |
|  191 |
|  226 |
|  273 |
|  291 |
|  346 |
|  392 |
|  396 |
|  397 |
+------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+---------+------+------------------------------------+
| Level   | Code | Message                            |
+---------+------+------------------------------------+
| warning | 1000 | query time exceeded max_query_time |
+---------+------+------------------------------------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small, idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=5;"
––– output –––
+------+
| id   |
+------+
|  112 |
|  380 |
|  707 |
|  822 |
|  311 |
|  323 |
|  570 |
|  630 |
|  950 |
|   62 |
|  115 |
|  189 |
|  191 |
|  226 |
|  273 |
|  291 |
|  346 |
|  392 |
|  396 |
|  397 |
+------+
daikoz commented 3 months ago

Hi, 6.3.4 is out today. I don't see this fix. Is this an oversight in the changelog ? thx

tomatolog commented 3 months ago

6.3.4 is service release and does not include all fixes from the master. I sure this fix along with all other fixes will be in the next major release.

You could use package from the dev repository to get this crash fixed.

sanikolaev commented 3 months ago

The fix has been included in 6.3.6.

daikoz commented 3 months ago

After test with 6.3.6, I always the issue when I add:

persistent_connections_limit = 32

in searchd section

sanikolaev commented 2 months ago

@daikoz I can't reproduce the issue in 6.3.7:

snikolaev@dev2:~/issue-2352$ while IFS= read -r ligne; do mysql -h0 -P29306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.070) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.019) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
...

and so on and no error like:

ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema

My config is:

mysql> desc AdvertisementsLB;
+--------------------------------+-------------------+
| Agent                          | Type              |
+--------------------------------+-------------------+
| localhost:29312:Advertisements | remote_1_mirror_1 |
| localhost:39312:Advertisements | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)

mysql> desc SearchKeywordsLB;
+--------------------------------+-------------------+
| Agent                          | Type              |
+--------------------------------+-------------------+
| localhost:29312:SearchKeywords | remote_1_mirror_1 |
| localhost:39312:SearchKeywords | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)

mysql> show tables;
+---------------------+-------------+
| Index               | Type        |
+---------------------+-------------+
| Advertisements      | distributed |
| AdvertisementsDelta | local       |
| AdvertisementsLB    | distributed |
| AdvertisementsMain  | local       |
| SearchKeywords      | local       |
| SearchKeywordsLB    | distributed |
+---------------------+-------------+
6 rows in set (0.00 sec)

on each node.

The mentioned setting is uncommented on both nodes:

snikolaev@dev2:~/issue-2352$ grep pers 1/manticore.conf 2/manticore.conf
1/manticore.conf:    persistent_connections_limit=10
1/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:Advertisements
1/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:SearchKeywords
2/manticore.conf:    persistent_connections_limit=10
2/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:Advertisements
2/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:SearchKeywords

The Manticore version is:

snikolaev@dev2:~/issue-2352$ searchd -v
Manticore 6.3.7 0ebcf8c82@24080614 dev (columnar 2.3.1 eb62283@24072718) (secondary 2.3.1 eb62283@24072718) (knn 2.3.1 eb62283@24072718)

Please check if the issue persists in this (or newer) version for you (https://mnt.cr/dev/nightly) and if it does - provide an instruction how to reproduce it.

daikoz commented 2 months ago

I tested on 6.3.7. I can reproduce this issue.

# searchd -v
Manticore 6.3.7 66626616f@24082118 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

In the scenario, I launch the 2 command while on same server.

We can do a zoom, I will share my screen.

Same issue on production when I try to activate persistent_connections_limit. Without persistent_connections_limit, no issue now.

tomatolog commented 2 months ago

could you create docker-compose file with indexes and configs we could run locally and catch this issue?