manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.97k stars 498 forks source link

Query performance is dependent on the order of keywords in match #2446

Open starinacool opened 2 months ago

starinacool commented 2 months ago

Bug Description:

Consider the query:

select id,name  FROM listing2 WHERE match(' @n (  (5 марок unc)  )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;

Meta Result:

+---------------------+------------------------------------------------------------------+
| Variable_name       | Value                                                            |
+---------------------+------------------------------------------------------------------+
| warning             | table listing2: predicted query time exceeded max_predicted_time |
| total               | 20                                                               |
| total_found         | 549                                                              |
| total_relation      | eq                                                               |
| time                | 0.129                                                            |
| local_fetched_docs  | 4879631                                                          |
| local_fetched_hits  | 0                                                                |
| local_fetched_skips | 239401                                                           |
| predicted_time      | 802                                                              |
| keyword[0]          | 5                                                                |
| docs[0]             | 1394039                                                          |
| hits[0]             | 1595430                                                          |
| keyword[1]          | =марок                                                           |
| docs[1]             | 164728                                                           |
| hits[1]             | 170966                                                           |
| keyword[2]          | unc                                                              |
| docs[2]             | 329515                                                           |
| hits[2]             | 332916                                                           |
| keyword[3]          | марка                                                            |
| docs[3]             | 3947825                                                          |
| hits[3]             | 4777475                                                          |
| keyword[4]          | маркий                                                           |
| docs[4]             | 3942153                                                          |
| hits[4]             | 4743454                                                          |
+---------------------+------------------------------------------------------------------+

And this:

select id,name  FROM listing2 WHERE match(' @n (  (unc 5 марок)  )') ORDER by id asc LIMIT 0,20 OPTION ranker=none, cutoff=500000, max_predicted_time=100; SHOW META;

Meta:

+---------------------+--------------+
| Variable_name       | Value        |
+---------------------+--------------+
| total               | 20           |
| total_found         | 1038         |
| total_relation      | eq           |
| time                | 0.019        |
| local_fetched_docs  | 2937496      |
| local_fetched_hits  | 0            |
| local_fetched_skips | 140524       |
| predicted_time      | 475          |
| keyword[0]          | 5            |
| docs[0]             | 1394047      |
| hits[0]             | 1595439      |
| keyword[1]          | =марок       |
| docs[1]             | 164729       |
| hits[1]             | 170967       |
| keyword[2]          | unc          |
| docs[2]             | 329519       |
| hits[2]             | 332920       |
| keyword[3]          | марка        |
| docs[3]             | 3947845      |
| hits[3]             | 4777502      |
| keyword[4]          | маркий       |
| docs[4]             | 3942173      |
| hits[4]             | 4743480      |
+---------------------+--------------+

The difference in queries is only the order of match parameters. But first query is 10x slower then the second and hits the max_predicted_time limiter.

The solution I see: Optimiser should sort keywords in "AND" match blocks in order of number of docs in the index first. Then start searching.

It looks like it is the same issue as in 1641.

Manticore Search Version:

6.3.0

Operating System Version:

Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] Changelog updated
starinacool commented 2 months ago

1641

sanikolaev commented 2 months ago

I can reproduce the issue in the latest dev version on dev2 (in /home/snikolaev/issue-1641):

snikolaev@dev2:~/issue-1641$ rm /tmp/1; for n in `seq 1 1000`; do echo "select id  FROM listing2 WHERE match(' @n (  (5 марок unc)  )');" >> /tmp/1; done; time mysql -P9315 -h0 < /tmp/1 > /dev/null

real    0m28.920s
user    0m0.017s
sys 0m0.065s

snikolaev@dev2:~/issue-1641$ rm /tmp/2; for n in `seq 1 1000`; do echo "select id  FROM listing2 WHERE match(' @n (  (unc 5 марок)  )');" >> /tmp/2; done; time mysql -P9315 -h0 < /tmp/2 > /dev/null

real    0m13.169s
user    0m0.047s
sys 0m0.036s

Query details:

``` mysql> select id FROM listing2 WHERE match(' @n ( (5 марок unc) )'); SHOW META; +-----------+ | id | +-----------+ | 38867052 | | 303005086 | | 283207653 | | 303012522 | | 303006756 | | 303006314 | | 303005955 | | 303005799 | | 303005768 | | 303336917 | | 303098944 | | 303092288 | | 303092800 | | 303094309 | | 230253739 | | 293791820 | | 267009727 | | 275710727 | | 293828205 | | 298788392 | +-----------+ 20 rows in set (0.03 sec) --- 20 out of 700 results in 30ms --- +----------------+--------------+ | Variable_name | Value | +----------------+--------------+ | total | 20 | | total_found | 700 | | total_relation | eq | | time | 0.030 | | keyword[0] | 5 | | docs[0] | 1286113 | | hits[0] | 1464194 | | keyword[1] | =марок | | docs[1] | 139832 | | hits[1] | 145181 | | keyword[2] | unc | | docs[2] | 263203 | | hits[2] | 264194 | | keyword[3] | марка | | docs[3] | 3622880 | | hits[3] | 4319542 | | keyword[4] | маркий | | docs[4] | 3617138 | | hits[4] | 4287922 | +----------------+--------------+ 19 rows in set (0.00 sec) mysql> select id FROM listing2 WHERE match(' @n ( (unc 5 марок) )'); SHOW META; +-----------+ | id | +-----------+ | 283207653 | | 303012522 | | 303006756 | | 303105314 | | 303105310 | | 303105309 | | 303105308 | | 303105306 | | 303105305 | | 303105302 | | 303105301 | | 303105299 | | 303105298 | | 303105297 | | 303105295 | | 303105294 | | 303105292 | | 303105291 | | 303644091 | | 303644090 | +-----------+ 20 rows in set (0.02 sec) --- 20 out of 700 results in 16ms --- +----------------+--------------+ | Variable_name | Value | +----------------+--------------+ | total | 20 | | total_found | 700 | | total_relation | eq | | time | 0.016 | | keyword[0] | 5 | | docs[0] | 1286113 | | hits[0] | 1464194 | | keyword[1] | =марок | | docs[1] | 139832 | | hits[1] | 145181 | | keyword[2] | unc | | docs[2] | 263203 | | hits[2] | 264194 | | keyword[3] | марка | | docs[3] | 3622880 | | hits[3] | 4319542 | | keyword[4] | маркий | | docs[4] | 3617138 | | hits[4] | 4287922 | +----------------+--------------+ 19 rows in set (0.00 sec) ```