Closed sanikolaev closed 8 months ago
Same in manticoresearch/manticore:4.0.2
, i.e. not a new regression. Didn't check earlier versions.
Interesting that if I disable auto-expansion, then in case of the on*
query, the response time is the same:
snikolaev@dev2:~$ time curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on*"} }}'
{"took":2,"timed_out":false,"hits":{"total":17482,"total_relation":"eq","hits":[{"_id":"1559568","_score":1670,"_source":{"username":"ONIE ONEAL"}},{"_id":"8901204","_score":1660,"_source":{"username":"ONEIDA ONG"}},{"_id":"9558471","_score":1660,"_source":{"username":"ONA ONEAL"}},{"_id":"860282","_score":1624,"_source":{"username":"DENAE ONEIL"}},{"_id":"860385","_score":1624,"_source":{"username":"ONITA TIERNEY"}},{"_id":"60418","_score":1624,"_source":{"username":"ONIE GUFFEY"}},{"_id":"860591","_score":1624,"_source":{"username":"MICHA ONEIL"}},{"_id":"860642","_score":1624,"_source":{"username":"ONITA CLEGG"}},{"_id":"860969","_score":1624,"_source":{"username":"KAROLE ONEILL"}},{"_id":"861828","_score":1624,"_source":{"username":"PHILLIP ONEAL"}},{"_id":"62159","_score":1624,"_source":{"username":"ONEIDA BRILL"}},{"_id":"862854","_score":1624,"_source":{"username":"ONITA OLMSTEAD"}},{"_id":"63206","_score":1624,"_source":{"username":"EULAH ONTIVEROS"}},{"_id":"63252","_score":1624,"_source":{"username":"RAUL ONEILL"}},{"_id":"863311","_score":1624,"_source":{"username":"ONA COBB"}},{"_id":"63396","_score":1624,"_source":{"username":"JESUS ONG"}},{"_id":"63751","_score":1624,"_source":{"username":"LUCIANA ONEAL"}},{"_id":"863775","_score":1624,"_source":{"username":"ONEIDA ROARK"}},{"_id":"864149","_score":1624,"_source":{"username":"KEELEY ONEAL"}},{"_id":"864572","_score":1624,"_source":{"username":"ONIE PUCKETT"}}]}}
real 0m0.008s
user 0m0.000s
sys 0m0.006s
snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('on*'); show meta;"
+---------+-----------------+
| id | username |
+---------+-----------------+
| 1559568 | ONIE ONEAL |
| 8901204 | ONEIDA ONG |
| 9558471 | ONA ONEAL |
| 860282 | DENAE ONEIL |
| 860385 | ONITA TIERNEY |
| 60418 | ONIE GUFFEY |
| 860591 | MICHA ONEIL |
| 860642 | ONITA CLEGG |
| 860969 | KAROLE ONEILL |
| 861828 | PHILLIP ONEAL |
| 62159 | ONEIDA BRILL |
| 862854 | ONITA OLMSTEAD |
| 63206 | EULAH ONTIVEROS |
| 63252 | RAUL ONEILL |
| 863311 | ONA COBB |
| 63396 | JESUS ONG |
| 63751 | LUCIANA ONEAL |
| 863775 | ONEIDA ROARK |
| 864149 | KEELEY ONEAL |
| 864572 | ONIE PUCKETT |
+---------+-----------------+
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 20 |
| total_found | 17482 |
| total_relation | eq |
| time | 0.002 |
| keyword[0] | on* |
| docs[0] | 17493 |
| hits[0] | 17493 |
+----------------+-------+
real 0m0.008s
user 0m0.006s
sys 0m0.000s
but *on*
is still slow.
So it looks like the issue has smth do with infix search.
could be different default values for query via SQL and JSON endpoints
Another example with a different keyword which makes the difference much bigger: 49 sec vs 671 ms
wget https://raw.githubusercontent.com/Raven-SL/ru-pnames-list/master/lists/male_names_rus.txt
wget https://raw.githubusercontent.com/Raven-SL/ru-pnames-list/master/lists/male_surnames_rus.txt
wget https://gist.githubusercontent.com/sanikolaev/5f4cea361b6869ee9ca7eb815be4fb76/raw/c203ad550ede2a0e1b5dab19a780e833b2367827/load_names.php -O load_names.php
php load_names.php 10000 10 10000000 1
mysql -P9306 -h0 -e "optimize table name option sync=1, cutoff=1;"
echo 'eyJpbmRleCIgOiAibmFtZSIsICJxdWVyeSIgOiB7Im1hdGNoIiA6IHsiX2FsbCI6ICLQvtCyIn19
fQo='|base64 -d > /tmp/query
snikolaev@dev2:~$ curl -vsX POST http://localhost:9308/search -d @/tmp/query
* Trying 127.0.0.1:9308...
* Connected to localhost (127.0.0.1) port 9308 (#0)
> POST /search HTTP/1.1
> Host: localhost:9308
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Length: 58
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: 6.2.13 4950f3d02@24020213 dev (columnar 2.2.5 214ce90@240115) (secondary 2.2.5 214ce90@240115) (knn 2.2.5 214ce90@240115) (buddy v2.1.3)
< Content-Type: application/json; charset=UTF-8
< Content-Length: 1844
<
{"took":49781,"timed_out":false,"hits":{"total":4054337,"total_relation":"eq","hits":[{"_id":"9740003","_score":1539,"_source":{"username":"Джованни Галаганов"}},{"_id":"9960072","_score":1539,"_source":{"username":"Овадья Циолковский"}},{"_id":"9610093","_score":1539,"_source":{"username":"Овадья Недобитов"}},{"_id":"9700096","_score":1539,"_source":{"username":"Джованни Вислоухов"}},{"_id":"9900125","_score":1539,"_source":{"username":"Джованни Басунов"}},{"_id":"9880131","_score":1539,"_source":{"username":"Иов Хухряков"}},{"_id":"9980148","_score":1539,"_source":{"username":"Иов Сафьянов"}},{"_id":"9900166","_score":1539,"_source":{"username":"Людовик Калиновский"}},{"_id":"9610185","_score":1539,"_source":{"username":"Яков Сабашников"}},{"_id":"9930195","_score":1539,"_source":{"username":"Овадья Осонов"}},{"_id":"9660208","_score":1539,"_source":{"username":"* Connection #0 to host localhost left intact
Иов Безруков"}},{"_id":"9580212","_score":1539,"_source":{"username":"Джованни Аббакумов"}},{"_id":"9750238","_score":1539,"_source":{"username":"Людовик Пешников"}},{"_id":"9780246","_score":1539,"_source":{"username":"Яков Флегентов"}},{"_id":"9930279","_score":1539,"_source":{"username":"Джованни Филипович"}},{"_id":"9900294","_score":1539,"_source":{"username":"Яков Грабаров"}},{"_id":"9930296","_score":1539,"_source":{"username":"Яков Кованько"}},{"_id":"9750298","_score":1539,"_source":{"username":"Иов Годовалов"}},{"_id":"9600301","_score":1539,"_source":{"username":"Овадья Пестряков"}},{"_id":"9880310","_score":1539,"_source":{"username":"Джованни Анцифиров"}}]}}
snikolaev@dev2:~$ time mysql -P9306 -h0 -e "select * from name where match('ов'); show meta;"
+---------+---------------------------------------+
| id | username |
+---------+---------------------------------------+
| 9740003 | Джованни Галаганов |
| 9960072 | Овадья Циолковский |
| 9610093 | Овадья Недобитов |
| 9700096 | Джованни Вислоухов |
| 9900125 | Джованни Басунов |
| 9880131 | Иов Хухряков |
| 9980148 | Иов Сафьянов |
| 9900166 | Людовик Калиновский |
| 9610185 | Яков Сабашников |
| 9930195 | Овадья Осонов |
| 9660208 | Иов Безруков |
| 9580212 | Джованни Аббакумов |
| 9750238 | Людовик Пешников |
| 9780246 | Яков Флегентов |
| 9930279 | Джованни Филипович |
| 9900294 | Яков Грабаров |
| 9930296 | Яков Кованько |
| 9750298 | Иов Годовалов |
| 9600301 | Овадья Пестряков |
| 9880310 | Джованни Анцифиров |
+---------+---------------------------------------+
+----------------+---------+
| Variable_name | Value |
+----------------+---------+
| total | 20 |
| total_found | 4054337 |
| total_relation | eq |
| time | 0.671 |
| keyword[0] | *ов* |
| docs[0] | 4081690 |
| hits[0] | 4081690 |
| keyword[1] | ов |
| docs[1] | 0 |
| hits[1] | 0 |
+----------------+---------+
real 0m0.689s
user 0m0.000s
sys 0m0.009s
That is dupe of https://sphinxsearch.com/bugs/view.php?id=1253 (appliable to http queries)
I confirm it looks ok now:
snikolaev@dev2:~$ time (for n in `seq 1 100`; do mysql -P9306 -h0 -e "select * from name where match('on'); show meta;"; done > /dev/null)
real 0m8.881s
user 0m0.261s
sys 0m0.289s
snikolaev@dev2:~$ time (for n in `seq 1 100`; do curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'; done > /dev/null)
real 0m8.713s
user 0m0.184s
sys 0m0.379s
@PavelShilin89 let's try to cover this with a small CLT test and see how stable it is in the github runners environment. Notes:
github-manticoresearch-1794-curl-vs-mysql-performance
, so we know what issue it relates to and what it is about.# Capture the current time in nanoseconds before starting the curl operations
start=$(date +%s%N)
# Execute a curl POST request 100 times in a loop, discard the output
for n in `seq 1 100`; do curl -sX POST http://localhost:9308/search -d '{"index" : "name", "query" : {"match" : {"_all": "on"} }, "limit": 0}'; done > /dev/null
# Capture the current time in nanoseconds after completing the curl operations
end=$(date +%s%N)
# Calculate the duration of the curl operations in milliseconds
curl=$(( (end - start) / 1000000 ))
# Capture the current time in nanoseconds before starting the MySQL operations
start=$(date +%s%N)
# Execute a MySQL query 100 times in a loop, discard the output
for n in `seq 1 100`; do mysql -P9306 -h0 -e "select * from name where match('on'); show meta;"; done > /dev/null
# Capture the current time in nanoseconds after completing the MySQL operations
end=$(date +%s%N)
# Calculate the duration of the MySQL operations in milliseconds
mysql=$(( (end - start) / 1000000 ))
# Print the durations of MySQL and curl operations in milliseconds
echo "Duration: ${mysql} ${curl} ms"
# Use PHP to calculate the percentage difference between the curl and MySQL durations and check if it's >= 1%
# Exits with 0 if the condition is true, otherwise exits with 1
php -r "exit(round(abs($curl - $mysql) / ($curl + $mysql)/2 * 100, 0) >= 1);"
# Print the exit status of the PHP script (0 if percentage difference >= 1%, 1 otherwise)
echo $?
Result on dev2:
Duration: 7847 7782 ms
0
Please create a separate task for yourself about it and put a link here.
In some case the same query which takes about 80ms via SQL can take 250ms via JSON.
MRE (dev2):
When there's only one disk chunk the difference is even larger (the likely related issue is https://github.com/manticoresoftware/manticoresearch/issues/1767):
Version: