manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.11k stars 509 forks source link

6.2.0 gives syntax error, older versions don't #1335

Closed cappadaan closed 10 months ago

cappadaan commented 1 year ago

Just upgraded tot 6.2.0

This query-part:

WHERE MATCH('(@(myfields) always_present productie | \"magazijn medewerker\") | (@(otherfields) ^=productie magazijn medewerker$)')

now gives

P08: syntax error, unexpected ')' near ')'

To me the query seems fine. I ran this query from the first manticore version, it always worked.

Is this correct behavior? What has changed in 6.2.0 ? There is nothing about this syntax in the breaking changes of 6.2.0.

UPDATE

See MRE in https://github.com/manticoresoftware/manticoresearch/issues/1335#issuecomment-1698776596

The performance issues related below are offtopic.

tomatolog commented 1 year ago

for me this query work fine

CREATE TABLE test ( id bigint, d1 multi64, myfields text, otherfields text );
INSERT into test (id, myfields, otherfields) values (10, 'test', 'up'), (11, 'always_present  productie medewerker', 'medewerker magazijn');

mysql> SELECT * FROM test WHERE MATCH('(@(myfields) always_present productie | "magazijn medewerker") | (@(otherfields) ^=productie magazijn medewerker$)');
+------+--------------------------------------+---------------------+------+
| id   | myfields                             | otherfields         | d1   |
+------+--------------------------------------+---------------------+------+
|   11 | always_present  productie medewerker | medewerker magazijn |      |
+------+--------------------------------------+---------------------+------+
1 row in set (0.00 sec)

it could be better to provide complete example that reproduces issue locally.

cappadaan commented 1 year ago

I will come back later on this issue. We downgraded to 6.0.4. because 6.2 was causing a very high CPU load. I cannot test this good enough for now.

tomatolog commented 1 year ago

you could try to disable pseudo_sharding at the config searchd.pseudo_sharding = 0 or check show status output load* counters along with show sessions and show threads to check what query cause the load and is it CPU load become higher or CPU utilize 100% of cores?

cappadaan commented 1 year ago

The new version simply uses a lot more CPU. I tried to decrease the amount of threads: this kept the server CPU under 80% but it was not enough: the search became super slow.

Never had any issues before with CPU.

sanikolaev commented 1 year ago

@cappadaan You could try disabling pseudo_sharding or use OPTION threads to fine-tune its behavior. In version 6.2.0, the pseudo-sharding functionality underwent a few changes, including tighter integration with the secondary indexes functionality and smarter query optimization. The goal was to make the search faster by optimizing CPU utilization. Let me remind you that we adhere to the concept that the CPU shouldn't remain idle; so if Manticore can load it to 100% to expedite the search, it will do so. However, if you are certain that you're experiencing both a higher CPU load and slower search performance, we would appreciate it if you let us know how to reproduce the issue, so we can investigate further.

cappadaan commented 1 year ago

Did another try in upgrading from 6.0.4 to 6.2.12

Server 1:

Screenshot 2023-08-24 at 19 37 02

Server 2:

Screenshot 2023-08-24 at 19 36 47

Server 2 gives an SQL error on exact the same query as server 1.

If I change the query to:

SELECT 1 FROM WHERE MATCH('@(always_present) always_present | "Technisch Medewerker"')

so without ( and ), it works.

I am totally confused.

sanikolaev commented 1 year ago

I'm confused too and can't reproduce it on a synthetic dataset. I'm afraid we can't reproduce and fix it without your data. Feel free to send them to our write-only S3 storage - https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data

cappadaan commented 1 year ago

I found the cause. Server 1 has no wordforms, server 2 has. The wordform breaks the query, but only if they are surrounded by quotes and contain a space.

query = select 1 from index where match('"technisch medewerker"')

For this specific query I have 2 rows in the wordform:

medeweker > medewerker Medewerkster > medewerker

Can you reproduce it now?

sanikolaev commented 1 year ago

Unfortunately, I still can't reproduce it like this:

➜  ~ cat /tmp/wordforms
medeweker > medewerker
Medewerkster > medewerker

➜  ~ mysql -P9306 -h0 -v
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3990
Server version: 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) git branch manticore-6.2.12...origin/manticore-6.2.12

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /Users/sn/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists index; create table index(always_present text) wordforms='/tmp/wordforms'; insert into index values(0, 'always_present Technisch Medewerker'); SELECT 1 FROM index WHERE MATCH('(@(always_present) always_present | "Technisch Medewerker")'); select 1 from index where match('"technisch medewerker"');
--------------
drop table if exists index
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table index(always_present text) wordforms='/tmp/wordforms'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into index values(0, 'always_present Technisch Medewerker')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
SELECT 1 FROM index WHERE MATCH('(@(always_present) always_present | "Technisch Medewerker")')
--------------

+------+
| 1    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

--------------
select 1 from index where match('"technisch medewerker"')
--------------

+------+
| 1    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

Can you modify this example to a reproducible one?

cappadaan commented 1 year ago

Try this query

select 1 from index where match('("technisch medewerker")')

so with ( and ) extra.

This one fails.

sanikolaev commented 1 year ago

It doesn't fail for me:

➜  ~ cat /tmp/wordforms
medeweker > medewerker
Medewerkster > medewerker
➜  ~ mysql -P9306 -h0 -v
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4183
Server version: 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) git branch manticore-6.2.12...origin/manticore-6.2.12

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Reading history-file /Users/sn/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists index; create table index(always_present text) wordforms='/tmp/wordforms'; insert into index values(0, 'always_present Technisch Medewerker'); select 1 from index where match('("technisch medewerker")');
--------------
drop table if exists index
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table index(always_present text) wordforms='/tmp/wordforms'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into index values(0, 'always_present Technisch Medewerker')
--------------

Query OK, 1 row affected (0.01 sec)

--------------
select 1 from index where match('("technisch medewerker")')
--------------

+------+
| 1    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)
cappadaan commented 1 year ago

Ok, it only occurs on my specific wordform-file. No only the 2 rules as earlier mentioned. I cannot share this wordform file here, so I sent you a mail about it.

sanikolaev commented 1 year ago

Thanks @cappadaan .

The MRE is:

➜  ~ cat /tmp/wf
a b c => d b c

mysql> drop table if exists index; create table index(f text) wordforms='/tmp/wf'; insert into index values(0, 'abc'); select 1 from index where match('("e a")');
--------------
drop table if exists index
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table index(f text) wordforms='/tmp/wf'
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into index values(0, 'abc')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select 1 from index where match('("e a")')
--------------

ERROR 1064 (42000): index index: P08: syntax error, unexpected $end near ''
masarinetwork commented 1 year ago

I will come back later on this issue. We downgraded to 6.0.4. because 6.2 was causing a very high CPU load. I cannot test this good enough for now.

I agree that version 6.2.0 cause high CPU load, nearly Double. Downgrade to 6.0.4 and Half Load.

cappadaan commented 1 year ago

Using threads=1 on my queries fixed the CPU problem for me.

sanikolaev commented 1 year ago

I agree that version 6.2.0 cause high CPU load, nearly Double. Downgrade to 6.0.4 and Half Load.

@masarinetwork What about the response time and throughput? The assumption is that doubling the CPU load would lead to about 2x lower avg response time. Can you provide the latency and throughput stats? That would be very valuable for the team.

masarinetwork commented 1 year ago

Thank You @sanikolaev and @cappadaan

Currently Run On 6.0.4 on 12 CPU (6 Core Hyperthreading 12) and 256 GB RAM, SATA SSD DC / Enterprise.

Distributed Index with 2 Local Index, total data size 321 Gb, 280 tables, per table data range 1 Million -10 Million.

Also running MySQL with 280 Gb of data, low read, as data sources.

Running Nginx and PHP-FPM, average Request per Second 50 - 120, google Analytics Page Views about 100K - 120K (exclude bot).

Using PDO with Socket Connection /var/run/manticore/manticore.sock

Config:

listen = 0.0.0.0:9312 listen = 0.0.0.0:9306:mysql41 listen = /var/run/manticore/manticore.sock:mysql41 log = /var/log/manticore/searchd.log query_log_format = sphinxql query_log_min_msec = 1000 network_timeout = 10 qcache_max_bytes = 167772160 qcache_ttl_sec = 5m client_timeout = 2m sphinxql_timeout = 10m max_threads_per_query = 12 pseudo_sharding = 1 threads = 12 #12 By default it's set to the number of CPU cores on the server. pid_file = /var/run/manticore/searchd.pid seamless_rotate = 1 preopen_tables = 1 unlink_old = 1 access_plain_attrs = mlock access_blob_attrs = mmap_preread binlog_path = /var/lib/manticore

MySQL [(none)]> status;

mysql Ver 15.1 Distrib 10.6.15-MariaDB, for Linux (x86_64) using readline 5.1

Connection id: 7484 Current database: Manticore Current user: Usual SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server: MySQL Server version: 6.0.4 1a3a4ea82@230314 git branch HEAD (no branch) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: Db characterset: Client characterset: utf8 Conn. characterset: utf8 TCP port: 9306 Uptime: 12 hours 53 min 28 sec

Threads: 12 Queue: 56 Clients: 51 Vip clients: 0 Tasks: 52 Queries: 8366566 Wall: 2d CPU: 0us Queue/Th: 4.6 Tasks/Th: 4.3

MySQL [(none)]> show status; +-----------------------+------------------------+ | Counter | Value | +-----------------------+------------------------+ | uptime | 46501 | | connections | 7495 | | maxed_out | 0 | | version | 6.0.4 1a3a4ea82@230314 | | mysql_version | 6.0.4 1a3a4ea82@230314 | | command_search | 8383802 | | command_excerpt | 0 | | command_update | 0 | | command_keywords | 0 | | command_persist | 0 | | command_status | 1 | | command_flushattrs | 0 | | command_sphinxql | 0 | | command_ping | 0 | | command_delete | 0 | | command_set | 0 | | command_insert | 0 | | command_replace | 0 | | command_commit | 0 | | command_suggest | 0 | | command_json | 0 | | command_callpq | 0 | | command_cluster | 0 | | command_getfield | 0 | | agent_connect | 0 | | agent_tfo | 0 | | agent_retry | 0 | | queries | 8383798 | | dist_queries | 0 | | workers_total | 12 | | workers_active | 36 | | workers_clients | 35 | | workers_clients_vip | 0 | | work_queue_length | 40 | | query_wall | 135119.074 | | query_cpu | OFF | | dist_wall | 0.000 | | dist_local | 0.000 | | dist_wait | 0.000 | | query_reads | OFF | | query_readkb | OFF | | query_readtime | OFF | | avg_query_wall | 0.016 | | avg_query_cpu | OFF | | avg_dist_wall | 0.000 | | avg_dist_local | 0.000 | | avg_dist_wait | 0.000 | | avg_query_reads | OFF | | avg_query_readkb | OFF | | avg_query_readtime | OFF | | qcache_max_bytes | 167772160 | | qcache_thresh_msec | 3000 | | qcache_ttl_sec | 300 | | qcache_cached_queries | 20 | | qcache_used_bytes | 6204930 | | qcache_hits | 344 | +-----------------------+------------------------+

Average Load: 8 - 12, uptime command show below

08:25:16 up 86 days, 17:50, 1 user, load average: 7.61, 8.06, 8.99

On 6.0.4, single keyword search data 0.046 seconds, two keyword 0.012 seconds.

When using manticore 6.2, the average load becomes 14 - 18, and the query time is above 1-4 seconds, on the same data and same keyword. And because the CPU is more than 100%, it becomes very slow.

Nginx and PHP-FPM also become slow and often response time out, uptime robot monitoring 24 Hours goes to 80% uptime. Downgrade to 6.0.4 and no problem at all.

sanikolaev commented 1 year ago

and the query time is above 1-4 seconds

Do you mean you saw the value of avg_query_wall became 1-4 seconds?

What's interesting is that:

So neither of them can be the source of the perf. degradation.

Would it be possible to get your data and the searchd and query logs, so we can reproduce the issue on our side? We have a write-only S3 you can upload the data to https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data

It would be also great to remove query_log_min_msec = 1000 temporarily, so we can replay all the queries.

masarinetwork commented 1 year ago

Do you mean you saw the value of avg_query_wall became 1-4 seconds?

Yes, when using 6.2.4. We saw on search result, we display the query time on the front page. This is real data from manticore show meta and we display on search results: On 6.0.4, single keyword search data 0.046 seconds, two keyword 0.012 seconds.

you have pseudo_sharding disabled.

No, configuration is same and not changed.

you are not using the columnar library, i.e. no secondary indexes

We have use Plain Index. Yes we have no use secondary library, because Manticore is used for Full Text Search, on documentation Note that secondary indexes are not effective for full-text queries.

Sorry, we can not switch back to 6.2. We have tried 2 times upgrade to 6.2 and no luck CPU high load, 14 - 18 and sometimes reach more than 25, from 12 is 100% load.

Then switch back to 6.0.4 2 times with same configuration, and it's working fine, load 7 - 11, (100% is 12).

Additional Info, this also happen on CentOS 9 Stream, when using 6.2 average load is 4-5, then switch to 6.0.4 and average load is 1-2. No change in config, pseudo_sharding enabled.

We have seen on searchd.log when using 6.2: [Wed Sep 6 08:00:06.075 2023] [19740] WARNING: send() failed: 32: Broken pipe, sock=5906 [Wed Sep 6 08:00:06.075 2023] [19740] WARNING: conn (local)(65315), sock=5906: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.087 2023] [19745] WARNING: send() failed: 32: Broken pipe, sock=7563 [Wed Sep 6 08:00:06.087 2023] [19745] WARNING: conn (local)(65348), sock=7563: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.089 2023] [19740] WARNING: send() failed: 32: Broken pipe, sock=1615 [Wed Sep 6 08:00:06.089 2023] [19740] WARNING: conn (local)(65210), sock=1615: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.117 2023] [19743] WARNING: send() failed: 32: Broken pipe, sock=7438 [Wed Sep 6 08:00:06.117 2023] [19743] WARNING: conn (local)(65245), sock=7438: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.121 2023] [19741] WARNING: send() failed: 32: Broken pipe, sock=6773 [Wed Sep 6 08:00:06.121 2023] [19741] WARNING: conn (local)(65267), sock=6773: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.127 2023] [19734] WARNING: send() failed: 32: Broken pipe, sock=7406 [Wed Sep 6 08:00:06.127 2023] [19734] WARNING: conn (local)(65338), sock=7406: send() failed: 32: Broken pipe [Wed Sep 6 08:00:06.127 2023] [19745] WARNING: send() failed: 32: Broken pipe, sock=7202 [Wed Sep 6 08:00:06.127 2023] [19745] WARNING: conn (local)(65209), sock=7202: send() failed: 32: Broken pipe

masarinetwork commented 1 year ago

Using threads=1 on my queries fixed the CPU problem for me.

But it's not a good choice, because it will slow down the query time. I choose downgrade to 6.0.4 and use thread 12 (number of my CPU), and max_threads_per_query = 12 (number of my CPU).

All of our site content and URL is full text search. There is no option for slow down query result time.

As we know, Web Vitals's good Connect Time is 200 ms and TTFB is 800 ms.

cappadaan commented 1 year ago

You already use only 1 thread per query, because your pseudo_sharding=0.

masarinetwork commented 1 year ago

You already use only 1 thread per query, because your pseudo_sharding=0.

No, my real config is pseudo_sharding=1

Sorry, I think I miss when copying, because I delete remark after pseudo_sharding = 1 # remark of Manticore URL deleted

I have made the changes to correct my config above

cappadaan commented 1 year ago

Try disabling pseudo_sharding, it worked for me. We process a lot of queries per second on a 32 cpu machine and it ended up being faster.

masarinetwork commented 1 year ago

Thank you, but it's big number of CPU 👍. How much query per second and how much data set?

I only have 12 CPU (Hyper threading) and 256 Gb of RAM, and 321 Gb of data, with 50 -150 full text query per second.

Now you use manticore 6.2 or 6.0?

sanikolaev commented 1 year ago

Now you use manticore 6.2 or 6.0?

Rolling back to 6.0 makes sense. However, I'd like to emphasize that it would be helpful if we could reproduce and debug your issue. I'm uncertain if we can address it without your assistance, given that it doesn't appear to be a widespread problem following the upgrade to 6.2.0 or 6.2.12. We carefully tested 6.2.0 in a large production environment for weeks before its release, then we released 6.2.12 with a dozen of fixes reported by the community and since then, we haven't received numerous complaints about performance degradation.

The ball is in your court, @masarinetwork and @cappadaan: if you assist us in reproducing this issue, we can address it in the upcoming release. Otherwise, the bug may remain somewhat known but too specific to fix.

About turning off pseudo_sharding - in version 6.2.x, it should turn off by itself when needed. If you have to turn it off manually, we'd also like to reproduce the case when it's required and fine-tune the behaviour.

masarinetwork commented 1 year ago

Thank You @sanikolaev

I will switch back to 6.2 on CentOS 9 Stream. And will report on bug report.

Because on CentOS 9 Stream the same problem also occurs, where on 6.2 the load becomes 200% compared to when using 6.0.4, with the same configuration and hardware.

Indeed, on CentOS 9 Stream there is no server overload because the data set is smaller and there are more CPUs, 24 CPUs.

d3mon187 commented 1 year ago

I'm also getting these problems on my Ubuntu server.

Searching with quotes gives a syntax error when performing a search like MATCH('@(street)("west ave")'). Both words have wordforms for w, av, avenue, etc. Seems like you already have an example that breaks it though.

Multi thread searches are hanging when max_threads_per_query number is 2 or higher. During that time, those threads are at 100% and don't stop until searchd is stopped. Single thread or disabling Pseudo sharding works. It also only seems to hang when adding an additional where parameter. Even stranger, sometimes queries will go through with max_threads_per_query=2, but it never works with anything higher.

MATCH('@(street)(west ave)') and country='us' 6.2.12 multi = hangs 6.2.12 single = 700 cpums 6.0.4 multi = 500cpums found 7199

MATCH('@(street)(west ave)') 6.2.12 multi = 500 cpums 6.2.12 single = 500 cpums 6.0.4 multi = 500 cpums found 1042736

sanikolaev commented 1 year ago

Multi thread searches are hanging when max_threads_per_query number is 2 or higher. During that time, those threads are at 100% and don't stop until searchd is stopped.

@d3mon187 How do I reproduce this?

d3mon187 commented 1 year ago

Multi thread searches are hanging when max_threads_per_query number is 2 or higher. During that time, those threads are at 100% and don't stop until searchd is stopped.

@d3mon187 How do I reproduce this?

I was seeing it when I did a search like "select * from index where MATCH('@(street)(west ave)') and country='us'".
Where street is a sql_field_string and country might have been a sql_attr_string. Index has wordforms, and index_exact_words/min_word_len/min_prefix_len/ngram_len all set to 1. Let me know if there are any other specific settings that might help. Right now my servers are all back on 6.0, but I can try and run more tests later this week once my test server is finished crunching if you can't reproduce.

Thanks!

sanikolaev commented 1 year ago

I was seeing it when I did a search like "select * from index where MATCH('@(street)(west ave)') and country='us'".

Can be reproduced in Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) like this:

root@perf3 ~ # mysql -P9306 -h0 < /tmp/dump.sql
root@perf3 ~ # mysql -P9306 -h0 -e "select * from test where MATCH('@street walnut blvd') and country='US';"
... hangs ...

Workaround - disable secondary indexes:

root@perf3 ~ # mysql -P9306 -h0 -e "set global secondary_indexes=0; select * from test where MATCH('@street walnut blvd') and country='US';"
+--------+------------------+---------+
| id     | street           | country |
+--------+------------------+---------+
| 871851 | 838 Walnut Blvd  | US      |
| 735050 | 6810 Walnut Blvd | US      |
| 916478 | 6810 Walnut Blvd | US      |
| 736892 | 838 Walnut Blvd  | US      |
| 976031 | 6810 Walnut Blvd | US      |
| 978104 | 7504 Walnut Blvd | US      |
| 775419 | 7504 Walnut Blvd | US      |
| 761688 | 7504 Walnut Blvd | US      |
| 761921 | 7504 Walnut Blvd | US      |
| 937116 | 6810 Walnut Blvd | US      |
| 849275 | 7504 Walnut Blvd | US      |
| 980283 | 6810 Walnut Blvd | US      |
| 996936 | 6810 Walnut Blvd | US      |
| 754518 | 838 Walnut Blvd  | US      |
| 756364 | 838 Walnut Blvd  | US      |
| 834079 | 7504 Walnut Blvd | US      |
| 758933 | 7504 Walnut Blvd | US      |
| 987945 | 7504 Walnut Blvd | US      |
| 836745 | 6810 Walnut Blvd | US      |
| 988821 | 838 Walnut Blvd  | US      |
+--------+------------------+---------+

The dump is: dump.sql.tgz

How it was generated:

```php root@perf3 ~ # cat load_gh_1335.php #!/usr/bin/php \n"); // This function waits for an idle mysql connection for the $query, runs it and exits function process($query) { global $all_links; global $requests; global $latencies; foreach ($all_links as $k=>$link) { if (@$requests[$k]) continue; mysqli_query($link, $query, MYSQLI_ASYNC); @$requests[$k] = microtime(true); return true; } do { $links = $errors = $reject = array(); foreach ($all_links as $link) { $links[] = $errors[] = $reject[] = $link; } $count = @mysqli_poll($links, $errors, $reject, 0, 1000); if ($count > 0) { foreach ($links as $j=>$link) { $res = @mysqli_reap_async_query($links[$j]); foreach ($all_links as $i=>$link_orig) if ($all_links[$i] === $links[$j]) break; if ($link->error) { echo "ERROR in '".substr($query, 0, 100)."...': {$link->error}\n"; if (!mysqli_ping($link)) { echo "ERROR: mysql connection is down, removing it from the pool\n"; unset($all_links[$i]); // remove the original link from the pool unset($requests[$i]); // and from the $requests too } return false; } if ($res === false and !$link->error) continue; if (is_object($res)) { mysqli_free_result($res); } $latencies[] = microtime(true) - $requests[$i]; $requests[$i] = microtime(true); mysqli_query($link, $query, MYSQLI_ASYNC); // making next query return true; } }; } while (true); return true; } $t = microtime(true); $all_links = []; $requests = []; $latencies = []; $c = 0; for ($i=0;$i<$argv[2];$i++) { $m = @mysqli_connect('127.0.0.1', '', '', '', 9306); if (mysqli_connect_error()) die("Cannot connect to Manticore\n"); $all_links[] = $m; } // init mysqli_query($all_links[0], "drop table if exists test"); mysqli_query($all_links[0], "create table test(street text attribute indexed, country string)"); $batch = []; $query_start = "insert into test(id, street, country) values "; $addresses = [ 0 => '312 Walnut Rd', 1 => '8865 Cherry St', 2 => '163 Elm Dr', 3 => '7011 Elm Blvd', 4 => '7270 Elm Rd', 5 => '7484 Cherry Rd', 6 => '8926 Cherry Ln', 7 => '9224 Cherry St', 8 => '3629 Aspen Dr', 9 => '8991 Elm Blvd', 10 => '4009 Elm Ave', 11 => '4633 Elm Rd', 12 => '3553 Birch Ln', 13 => '4297 Spruce Rd', 14 => '9114 Pine Dr', 15 => '1602 Spruce Dr', 16 => '8748 Oak Rd', 17 => '5829 Aspen Ave', 18 => '3302 Cherry St', 19 => '9555 Cedar Ave', 20 => '2007 Cherry St', 21 => '9149 Walnut Ave', 22 => '2093 Cedar Dr', 23 => '7751 Cedar Dr', 24 => '7819 Pine Ave', 25 => '3938 Walnut Rd', 26 => '200 Birch Ln', 27 => '531 Walnut Dr', 28 => '8287 Maple Ln', 29 => '1784 Maple St', 30 => '4243 Oak Blvd', 31 => '4707 Spruce Rd', 32 => '4823 Aspen Rd', 33 => '7204 Cedar Blvd', 34 => '1846 Pine Rd', 35 => '4349 Birch St', 36 => '2203 Elm Ln', 37 => '6662 Cherry Rd', 38 => '8335 Pine Dr', 39 => '3535 Aspen Rd', 40 => '9510 Spruce Rd', 41 => '8827 Pine Dr', 42 => '7437 Elm Ave', 43 => '4744 Elm Ln', 44 => '8260 Cedar Ave', 45 => '7822 Cedar Dr', 46 => '3127 Elm Dr', 47 => '9972 Elm Ln', 48 => '5298 Oak Rd', 49 => '4435 Maple Dr', 50 => '5410 Oak St', 51 => '6070 Aspen Rd', 52 => '9395 Oak Ln', 53 => '6880 Pine Rd', 54 => '2933 Cherry St', 55 => '6810 Walnut Blvd', 56 => '8694 Aspen Ln', 57 => '4065 Maple Dr', 58 => '2760 Cedar Ln', 59 => '1401 Spruce Ave', 60 => '2880 Cherry Dr', 61 => '3549 Maple Ln', 62 => '8382 Aspen St', 63 => '6983 Maple St', 64 => '58 Maple St', 65 => '9475 Spruce Blvd', 66 => '838 Walnut Blvd', 67 => '261 Aspen Dr', 68 => '6823 Elm St', 69 => '5082 Cherry Ave', 70 => '4613 Cherry Ln', 71 => '1293 Cherry Ave', 72 => '7504 Walnut Blvd', 73 => '2354 Cedar Ave', 74 => '4660 Spruce Dr', 75 => '8854 Pine Blvd', 76 => '9415 Walnut Ln', 77 => '1092 Walnut Ave', 78 => '270 Elm Rd', 79 => '7132 Elm Rd', 80 => '5726 Oak Ave', 81 => '670 Cedar Dr', 82 => '7131 Pine Blvd', 83 => '6232 Cedar Ave', 84 => '854 Walnut Ave', 85 => '9687 Aspen Rd', 86 => '7133 Oak Ln', 87 => '9689 Spruce Ave', 88 => '558 Cedar St', 89 => '8294 Birch Dr', 90 => '238 Oak Rd', 91 => '5756 Maple Rd', 92 => '6474 Aspen Blvd', 93 => '6252 Pine Ln', 94 => '1815 Maple Blvd', 95 => '461 Maple Ave', 96 => '3957 Maple Rd', 97 => '885 Elm Ave', 98 => '2747 Oak Dr', 99 => '2566 Cedar Rd']; $countries = [ 'AF', // Afghanistan 'AX', // Åland Islands 'AL', // Albania 'DZ', // Algeria 'AS', // American Samoa 'AD', // Andorra 'AO', // Angola 'AI', // Anguilla 'AQ', // Antarctica 'AG', // Antigua and Barbuda 'AR', // Argentina 'AM', // Armenia 'AW', // Aruba 'AU', // Australia 'AT', // Austria 'AZ', // Azerbaijan 'BS', // Bahamas 'BH', // Bahrain 'BD', // Bangladesh 'BB', // Barbados 'BY', // Belarus 'BE', // Belgium 'BZ', // Belize 'BJ', // Benin 'BM', // Bermuda 'BT', // Bhutan 'BO', // Bolivia 'BQ', // Bonaire, Sint Eustatius and Saba 'BA', // Bosnia and Herzegovina 'BW', // Botswana 'BV', // Bouvet Island 'BR', // Brazil 'IO', // British Indian Ocean Territory 'BN', // Brunei Darussalam 'BG', // Bulgaria 'BF', // Burkina Faso 'BI', // Burundi 'KH', // Cambodia 'CM', // Cameroon 'CA', // Canada 'CV', // Cape Verde 'KY', // Cayman Islands 'CF', // Central African Republic 'TD', // Chad 'CL', // Chile 'CN', // China 'CX', // Christmas Island 'CC', // Cocos (Keeling) Islands 'CO', // Colombia 'KM', // Comoros 'CG', // Congo 'CD', // Congo, the Democratic Republic of the 'CK', // Cook Islands 'CR', // Costa Rica 'CI', // Côte d'Ivoire 'HR', // Croatia 'CU', // Cuba 'CW', // Curaçao 'CY', // Cyprus 'CZ', // Czech Republic 'DK', // Denmark 'DJ', // Djibouti 'DM', // Dominica 'DO', // Dominican Republic 'EC', // Ecuador 'EG', // Egypt 'SV', // El Salvador 'GQ', // Equatorial Guinea 'ER', // Eritrea 'EE', // Estonia 'ET', // Ethiopia 'FK', // Falkland Islands (Malvinas) 'FO', // Faroe Islands 'FJ', // Fiji 'FI', // Finland 'FR', // France 'GF', // French Guiana 'PF', // French Polynesia 'TF', // French Southern Territories 'GA', // Gabon 'GM', // Gambia 'GE', // Georgia 'DE', // Germany 'GH', // Ghana 'GI', // Gibraltar 'GR', // Greece 'GL', // Greenland 'GD', // Grenada 'GP', // Guadeloupe 'GU', // Guam 'GT', // Guatemala 'GG', // Guernsey 'GN', // Guinea 'GW', // Guinea-Bissau 'GY', // Guyana 'HT', // Haiti 'HM', // Heard Island and McDonald Islands 'VA', // Holy See (Vatican City State) 'HN', // Honduras 'HK', // Hong Kong 'HU', // Hungary 'IS', // Iceland 'IN', // India 'ID', // Indonesia 'IR', // Iran, Islamic Republic of 'IQ', // Iraq 'IE', // Ireland 'IM', // Isle of Man 'IL', // Israel 'IT', // Italy 'JM', // Jamaica 'JP', // Japan 'JE', // Jersey 'JO', // Jordan 'KZ', // Kazakhstan 'KE', // Kenya 'KI', // Kiribati 'KP', // Korea, Democratic People's Republic of 'KR', // Korea, Republic of 'KW', // Kuwait 'KG', // Kyrgyzstan 'LA', // Lao People's Democratic Republic 'LV', // Latvia 'LB', // Lebanon 'LS', // Lesotho 'LR', // Liberia 'LY', // Libya 'LI', // Liechtenstein 'LT', // Lithuania 'LU', // Luxembourg 'MO', // Macao 'MK', // Macedonia, the former Yugoslav Republic of 'MG', // Madagascar 'MW', // Malawi 'MY', // Malaysia 'MV', // Maldives 'ML', // Mali 'MT', // Malta 'MH', // Marshall Islands 'MQ', // Martinique 'MR', // Mauritania 'MU', // Mauritius 'YT', // Mayotte 'MX', // Mexico 'FM', // Micronesia, Federated States of 'MD', // Moldova, Republic of 'MC', // Monaco 'MN', // Mongolia 'ME', // Montenegro 'MS', // Montserrat 'MA', // Morocco 'MZ', // Mozambique 'MM', // Myanmar 'NA', // Namibia 'NR', // Nauru 'NP', // Nepal 'NL', // Netherlands 'NC', // New Caledonia 'NZ', // New Zealand 'NI', // Nicaragua 'NE', // Niger 'NG', // Nigeria 'NU', // Niue 'NF', // Norfolk Island 'MP', // Northern Mariana Islands 'NO', // Norway 'OM', // Oman 'PK', // Pakistan 'PW', // Palau 'PS', // Palestine, State of 'PA', // Panama 'PG', // Papua New Guinea 'PY', // Paraguay 'PE', // Peru 'PH', // Philippines 'PN', // Pitcairn 'PL', // Poland 'PT', // Portugal 'PR', // Puerto Rico 'QA', // Qatar 'RE', // Réunion 'RO', // Romania 'RU', // Russian Federation 'RW', // Rwanda 'BL', // Saint Barthélemy 'SH', // Saint Helena, Ascension and Tristan da Cunha 'KN', // Saint Kitts and Nevis 'LC', // Saint Lucia 'MF', // Saint Martin (French part) 'PM', // Saint Pierre and Miquelon 'VC', // Saint Vincent and the Grenadines 'WS', // Samoa 'SM', // San Marino 'ST', // Sao Tome and Principe 'SA', // Saudi Arabia 'SN', // Senegal 'RS', // Serbia 'SC', // Seychelles 'SL', // Sierra Leone 'SG', // Singapore 'SX', // Sint Maarten (Dutch part) 'SK', // Slovakia 'SI', // Slovenia 'SB', // Solomon Islands 'SO', // Somalia 'ZA', // South Africa 'GS', // South Georgia and the South Sandwich Islands 'SS', // South Sudan 'ES', // Spain 'LK', // Sri Lanka 'SD', // Sudan 'SR', // Suriname 'SJ', // Svalbard and Jan Mayen 'SZ', // Swaziland 'SE', // Sweden 'CH', // Switzerland 'SY', // Syrian Arab Republic 'TW', // Taiwan, Province of China 'TJ', // Tajikistan 'TZ', // Tanzania, United Republic of 'TH', // Thailand 'TL', // Timor-Leste 'TG', // Togo 'TK', // Tokelau 'TO', // Tonga 'TT', // Trinidad and Tobago 'TN', // Tunisia 'TR', // Turkey 'TM', // Turkmenistan 'TC', // Turks and Caicos Islands 'TV', // Tuvalu 'UG', // Uganda 'UA', // Ukraine 'AE', // United Arab Emirates 'GB', // United Kingdom 'US', // United States 'UM', // United States Minor Outlying Islands 'UY', // Uruguay 'UZ', // Uzbekistan 'VU', // Vanuatu 'VE', // Venezuela, Bolivarian Republic of 'VN', // Vietnam 'VG', // Virgin Islands, British 'VI', // Virgin Islands, U.S. 'WF', // Wallis and Futuna 'EH', // Western Sahara 'YE', // Yemen 'ZM', // Zambia 'ZW' // Zimbabwe ]; srand(0); $error = false; while (count($all_links) and $c < $argv[3]) { $batch[] = "(".($c+1).",'".$addresses[rand(0, count($addresses) - 1)]."', '".$countries[rand(0, count($countries) - 1)]."')"; $c++; if (count($batch) == $argv[1]) { if (!process($query_start.implode(',', $batch))) { $error = true; break; } $batch = []; } } // wait until all the workers finish do { $links = $errors = $reject = array(); foreach ($all_links as $link) $links[] = $errors[] = $reject[] = $link; $count = @mysqli_poll($links, $errors, $reject, 0, 100); } while (count($all_links) != count($links) + count($errors) + count($reject)); echo "finished inserting\n"; echo round($argv[3] / (microtime(true) - $t))." docs per sec\n"; root@perf3 ~ # php load_gh_1335.php 10000 30 1000000 finished inserting 1328824 docs per sec ```

The bug is already fixed in the latest dev version Manticore 6.2.13 d016f3dc0@231025 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)

root@perf3 ~ # searchd -v
Manticore 6.2.13 d016f3dc0@231025 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)
...

root@perf3 ~ # mysql -P9306 -h0 < /tmp/dump.sql
root@perf3 ~ # mysql -P9306 -h0 -e "select * from test where MATCH('@street walnut blvd') and country='US';"
+--------+------------------+---------+
| id     | street           | country |
+--------+------------------+---------+
| 871851 | 838 Walnut Blvd  | US      |
| 735050 | 6810 Walnut Blvd | US      |
| 916478 | 6810 Walnut Blvd | US      |
| 736892 | 838 Walnut Blvd  | US      |
| 976031 | 6810 Walnut Blvd | US      |
| 978104 | 7504 Walnut Blvd | US      |
| 775419 | 7504 Walnut Blvd | US      |
| 761688 | 7504 Walnut Blvd | US      |
| 761921 | 7504 Walnut Blvd | US      |
| 937116 | 6810 Walnut Blvd | US      |
| 849275 | 7504 Walnut Blvd | US      |
| 980283 | 6810 Walnut Blvd | US      |
| 996936 | 6810 Walnut Blvd | US      |
| 754518 | 838 Walnut Blvd  | US      |
| 756364 | 838 Walnut Blvd  | US      |
| 834079 | 7504 Walnut Blvd | US      |
| 758933 | 7504 Walnut Blvd | US      |
| 987945 | 7504 Walnut Blvd | US      |
| 836745 | 6810 Walnut Blvd | US      |
| 988821 | 838 Walnut Blvd  | US      |
+--------+------------------+---------+

root@perf3 ~ # php load_gh_1335.php 10000 30 10000000
finished inserting
712309 docs per sec
root@perf3 ~ # mysql -P9306 -h0 -e "select * from test where MATCH('@street walnut blvd') and country='US';"
+---------+------------------+---------+
| id      | street           | country |
+---------+------------------+---------+
| 1360475 | 7504 Walnut Blvd | US      |
| 1366250 | 7504 Walnut Blvd | US      |
| 1366577 | 6810 Walnut Blvd | US      |
|  457504 | 6810 Walnut Blvd | US      |
| 1368926 | 7504 Walnut Blvd | US      |
| 1379943 | 6810 Walnut Blvd | US      |
| 1270264 | 838 Walnut Blvd  | US      |
|  420685 | 6810 Walnut Blvd | US      |
| 1272402 | 7504 Walnut Blvd | US      |
| 1273640 | 6810 Walnut Blvd | US      |
|  424763 | 6810 Walnut Blvd | US      |
| 1281267 | 7504 Walnut Blvd | US      |
|  432156 | 7504 Walnut Blvd | US      |
|  441679 | 7504 Walnut Blvd | US      |
|  668560 | 7504 Walnut Blvd | US      |
|  633020 | 7504 Walnut Blvd | US      |
|  633370 | 7504 Walnut Blvd | US      |
|   26523 | 838 Walnut Blvd  | US      |
|  637788 | 7504 Walnut Blvd | US      |
|  639612 | 838 Walnut Blvd  | US      |
+---------+------------------+---------+
d3mon187 commented 1 year ago

Awesome @sanikolaev ! Glad you were able to reproduce. Looking forward to the next release!

cappadaan commented 1 year ago

I can confirm both the syntax error bug and the performance bug are fixed in 6.2.13.

tomatolog commented 10 months ago

the issue with phrase at the query cause daemon to reply with the parsing error if index has workforms with multiple destination forms just fixed at https://github.com/manticoresoftware/manticoresearch/commit/f8bec55bd6d91f5d8a0a730e5742982924378c51

You need to update daemon from the dev repository after CI will publish packages to get issue fixed.