manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.04k stars 507 forks source link

Manticore FATAL: CRASH DUMP #416

Closed glukkkk closed 3 years ago

glukkkk commented 4 years ago

Describe the bug
Manticore crashes on a simple query.

To Reproduce
Just execute these queries and you will see the problem:

REATE TABLE testrt (data_id text, period text) morphology='stem_enru' rt_mem_limit='1024M' index_exact_words='1' expand_keywords='1' min_infix_len='3' charset_table='0..9, english, russian, _, U 0401->U 0435, U 0451->U 0435';`  

NSERT INTO testrt (data_id, period) VALUES ('00000000-0000-0000-0000-000000000000', '5');`


ELECT data_id, COUNT(DISTINCT data_id) AS cnt FROM testrt WHERE MATCH('(@period \"5\")') AND data_id <> '00000000-0000-0000-0000-000000000000' OPTION ranker  = 'wordcount';`  

Crash dump logs: https://pastebin.com/qhnYWpbA  

This also is reproduced here: https://play.manticoresearch.com/replication/  
sanikolaev commented 4 years ago

Thank you for such a clean reproducible case!

glukkkk commented 4 years ago

@sanikolaev Is this our fix?

https://github.com/manticoresoftware/manticoresearch/commit/2a474dc1a26e8b0f8aaaae95669caf2f1d4b7746

sanikolaev commented 4 years ago

@sanikolaev Is this our fix?

No, it's for another crash.

glukkkk commented 4 years ago

Ok, we're waiting :)

githubmanticore commented 4 years ago

➤ Sergey Nikolaev commented:

I'm not sure we'll be able to address it soon as this crash is caused by a wrong use of Manticore - you are not supposed to do:

You are supposed to use attributes for that, e.g. string attribute in your case:

mysql> create table crash (data_id string, period text) morphology='stem_enru' rt_mem_limit='1024M' index_exact_words='1' expand_keywords='1' min_infix_len='3' charset_table='0..9, english, russian, _'; 
Query OK, 0 rows affected (0.00 sec) 

mysql> INSERT INTO crash (data_id, period) VALUES ('00000000-0000-0000-0000-000000000000', '5'); 
Query OK, 1 row affected (0.00 sec) 

mysql> SELECT data_id, COUNT(DISTINCT data_id) AS cnt FROM crash WHERE MATCH('(@period \"5\")') AND data_id <> '00000000-0000-0000-0000-000000000000' OPTION ranker = 'wordcount'; 
+---------+------+ 
| data_id | cnt  | 
+---------+------+ 
| NULL    |    0 | 
+---------+------+ 
1 row in set, 1 warning (0.01 sec) 

FFR: minimized test which crashes:

`````sql drop table crash; create table crash (t text); INSERT INTO crash (t) VALUES ('abc'); SELECT t, COUNT(distinct t) AS cnt FROM crash WHERE t <> 'abc'; ``````` The proper behaviour should probably be like when I remove `t` from the `SELECT`: `````sql mysql> SELECT COUNT(distinct t) AS cnt FROM crash WHERE t <> 'abc'; ERROR 1064 (42000): index crash: group-count-distinct attribute 't' not found ```````
glukkkk commented 4 years ago

Ok, the truth is that we do not need the data_id <> '00000000-0000-0000-0000-000000000000 condition. You can simplify the query to this one:

SELECT data_id, COUNT(DISTINCT data_id) AS cnt FROM testrt WHERE MATCH('(@period \"5\")') OPTION ranker = 'wordcount';

And it will crash too. We need the data_id to be indexed and stored as long as we need to return it in SELECT queries and search by data_id using UUIDs in conditions (MATCH will be faster here rather than using = operator).

BTW, in your minimized example you can use the following query and you will get the same problem:

SELECT t, COUNT(distinct t) AS cnt FROM crash;

So, it is a bug!

sanikolaev commented 4 years ago

@glukkkk You can't do count(<non-attribute>) or count(distinct <non-attribute>). If you need COUNT(DISTINCT data_id) then you should use an attribute, not a stored full-text field (like in your case - data_id text). If you want data_id to be a full-text field and a string attribute at the same time you are supposed to define it as data_id string attribute indexed.

glukkkk commented 4 years ago

@sanikolaev Thank you for the clarification! It's OK for us to modify the config.

However, I suggest that you should throw some SQL exception in this case (something like COUNT(DISTINCT field) supports only attributes), but not crash the searchd service.

You can close the issue. Thank you.

githubmanticore commented 4 years ago

➤ Aleksey N. Vinogradov commented:

However, this very case includes following different defects (leaks of abstractions):

SELECT COUNT(DISTINCT data_id) AS cnt FROM testrt; 
SELECT data_id, COUNT(DISTINCT data_id) AS cnt FROM testrt; 
mysql> SELECT id, data_id FROM testrt WHERE data_id <> '00000000-0000-0000-0000-000000000000'; 
+---------------+---------+ 
| id            | data_id | 
+---------------+---------+ 
| 1677721600001 |         | 
+---------------+---------+ 

No segfault, but also no filtering.

mysql> SELECT id, data_id FROM testrt; 
+---------------+--------------------------------------+ 
| id            | data_id                              | 
+---------------+--------------------------------------+ 
| 1677721600001 | 00000000-0000-0000-0000-000000000000 | 
+---------------+--------------------------------------+ 

reference to previous case - data_id is actually present.

mysql> SELECT * FROM testrt WHERE data_id <> '00000000-0000-0000-0000-000000000000' AND period <> '5'; 
+---------------+---------+--------+ 
| id            | data_id | period | 
+---------------+---------+--------+ 
| 1677721600001 |         |        | 
+---------------+---------+--------+ 

No segfault, but weird output.

mysql> SELECT * FROM testrt WHERE data_id = '00000000-0000-0000-0000-000000000000'; 
Empty set (0.00 sec) 

No segfault, but weird output.