nck-2 / test-rep

0 stars 0 forks source link

Research on "max_matches" in ElasticSearch and possible to automate it in Manticore #1355

Open githubmanticore opened 1 year ago

githubmanticore commented 1 year ago

We have max_matches which sometimes doesn't make any sense since it doesn't change anything except increasing the latency, e.g.:

mysql> select * from lj where match('a') limit 5 option max_matches=20; show meta; 
+------+-----------+-----------+--------------+ 
| id   | published | author_id | num_comments | 
+------+-----------+-----------+--------------+ 
|   55 |      2005 |         0 |            0 | 
|  314 |      2005 |         0 |            0 | 
| 3353 |      2004 |         0 |            0 | 
| 3922 |      2005 |         0 |            0 | 
| 3931 |      2005 |         0 |            0 | 
+------+-----------+-----------+--------------+ 
5 rows in set (0.20 sec) 

+---------------+----------+ 
| Variable_name | Value    | 
+---------------+----------+ 
| total         | 20       | 
| total_found   | 3592491  | 
| time          | 0.205    | 
| keyword[0]    | a        | 
| docs[0]       | 3592491  | 
| hits[0]       | 22055534 | 
+---------------+----------+ 
6 rows in set (0.00 sec) 

mysql> select * from lj where match('a') limit 5 option max_matches=100000; show meta; 
+------+-----------+-----------+--------------+ 
| id   | published | author_id | num_comments | 
+------+-----------+-----------+--------------+ 
|   55 |      2005 |         0 |            0 | 
|  314 |      2005 |         0 |            0 | 
| 3353 |      2004 |         0 |            0 | 
| 3922 |      2005 |         0 |            0 | 
| 3931 |      2005 |         0 |            0 | 
+------+-----------+-----------+--------------+ 
5 rows in set (0.34 sec) 

+---------------+----------+ 
| Variable_name | Value    | 
+---------------+----------+ 
| total         | 100000   | 
| total_found   | 3592491  | 
| time          | 0.341    | 
| keyword[0]    | a        | 
| docs[0]       | 3592491  | 
| hits[0]       | 22055534 | 
+---------------+----------+ 
6 rows in set (0.00 sec) 
max_matches=10 : 
FINISHED. Total time: 8.242 sec, throughput: 1213 rps 
Latency stats: 
        count: 10000 latencies analyzed 
        avg: 3.837 ms 
        median: 0.378 ms 
        95p: 12.879 ms 
        99p: 66.461 ms 

max_matches=100000 : 
FINISHED. Total time: 16.147 sec, throughput: 619 rps 
Latency stats: 
        count: 10000 latencies analyzed 
        avg: 7.625 ms 
        median: 1.637 ms 
        95p: 27.812 ms 
        99p: 122.74 ms 

The queries are simple:

snikolaev@dev:~$ head -10 /work/dist/testdata/lj/ljquerylog.txt 
Cincinnati 
Radio Nueva Vida 
strange bedfellows ron hermione 
Channel 
thumbsucker 
Ayu 
дизайн интерфейсов 
Poetry 
Официальный Сайт Министерства Имущества России 
Books 

In ElasticSearch there's no max_matches. There're few other settings:

The task is to: