Closed sanikolaev closed 1 year ago
➤ Sergey Nikolaev commented:
Another MRE:
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.02 sec)
--------------
create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'
--------------
Query OK, 0 rows affected, 1 warning (0.01 sec)
--------------
replace into t(f1, f2) values ('peter', 'john')
--------------
Query OK, 1 row affected (0.01 sec)
--------------
SELECT * FROM t where match ('@f2 peter')
--------------
+---------------------+-------+------+
| id | f1 | f2 |
+---------------------+-------+------+
| 1515374047901253834 | peter | john |
+---------------------+-------+------+
1 row in set (0.00 sec)
--------------
show meta
--------------
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| total | 1 |
| total_found | 1 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =peter |
| docs[0] | 1 |
| hits[0] | 2 |
| keyword[1] | peter |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+--------+
10 rows in set (0.00 sec)
➤ Aleksey N. Vinogradov commented:
Can't reproduce at all.
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,00 sec)
Query OK, 0 rows affected (0,00 sec)
Query OK, 1 row affected (0,01 sec)
Empty set (0,00 sec)
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | peter |
| docs[0] | 1 |
| hits[0] | 1 |
+----------------+-------+
7 rows in set (0,00 sec)
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,00 sec)
Query OK, 0 rows affected (0,00 sec)
Query OK, 1 row affected (0,01 sec)
Empty set (0,00 sec)
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | peter |
| docs[0] | 1 |
| hits[0] | 1 |
+----------------+-------+
7 rows in set (0,00 sec)
Strange. I can't reproduce the peter/john case anymore too.
But this still fails for me in the latest dev version on mac m1:
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('abc', 'def'); select * from t; SELECT * FROM t where match ('@f2 abc'); show meta; call keywords('abc', 't'); call keywords('def', 't');
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'
--------------
Query OK, 0 rows affected, 1 warning (0.01 sec)
--------------
replace into t(f1, f2) values ('abc', 'def')
--------------
Query OK, 1 row affected (0.00 sec)
--------------
select * from t
--------------
+---------------------+------+------+
| id | f1 | f2 |
+---------------------+------+------+
| 1515384550488801301 | abc | def |
+---------------------+------+------+
1 row in set (0.01 sec)
--------------
SELECT * FROM t where match ('@f2 abc')
--------------
+---------------------+------+------+
| id | f1 | f2 |
+---------------------+------+------+
| 1515384550488801301 | abc | def |
+---------------------+------+------+
1 row in set (0.00 sec)
--------------
show meta
--------------
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 1 |
| total_found | 1 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =abc |
| docs[0] | 1 |
| hits[0] | 2 |
| keyword[1] | abc |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+-------+
10 rows in set (0.00 sec)
--------------
call keywords('abc', 't')
--------------
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | abc | abc |
| 1 | abc | =abc |
+------+-----------+------------+
2 rows in set (0.00 sec)
--------------
call keywords('def', 't')
--------------
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | def | def |
| 1 | def | =def |
+------+-----------+------------+
2 rows in set (0.00 sec)
Can't reproduce at all.
I see what's wrong: you've removed morphology_skip_fields='f2'
. As said in the issue:
Removing morphology_skip_fields='f2' eliminates the issue
If you don't remove it, it should reproduce the problem:
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'
--------------
Query OK, 0 rows affected, 1 warning (0.01 sec)
--------------
replace into t(f1, f2) values ('peter', 'john')
--------------
Query OK, 1 row affected (0.00 sec)
--------------
SELECT * FROM t where match ('@f2 peter')
--------------
+---------------------+-------+------+
| id | f1 | f2 |
+---------------------+-------+------+
| 1515384593958567938 | peter | john |
+---------------------+-------+------+
1 row in set (0.00 sec)
--------------
show meta
--------------
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| total | 1 |
| total_found | 1 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =peter |
| docs[0] | 1 |
| hits[0] | 2 |
| keyword[1] | peter |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+--------+
10 rows in set (0.00 sec)
➤ Aleksey N. Vinogradov commented:
If you look carefully, I've tried both, with morphology_skip_fields and without it. Both cases can't reproduce the issue. Tried both M1 mac and intel linux.
FIY, you've just cited 'mre' with skip field, but it've found something, strange. Mine still no success:
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,01 sec)
Query OK, 0 rows affected, 1 warning (0,01 sec)
Query OK, 1 row affected (0,00 sec)
Empty set (0,00 sec)
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =peter |
| docs[0] | 1 |
| hits[0] | 1 |
| keyword[1] | peter |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+--------+
10 rows in set (0,00 sec)
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,01 sec)
Query OK, 0 rows affected (0,01 sec)
Query OK, 1 row affected (0,00 sec)
Empty set (0,00 sec)
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | peter |
| docs[0] | 1 |
| hits[0] | 1 |
+----------------+-------+
7 rows in set (0,00 sec)
morphology_skip_fields affects output of 'show meta', but doesn't cause any fake doc to be found. So, still NOT reproducable.
➤ Aleksey N. Vinogradov commented:
upd. With release build morphology_skip_fields works.
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,01 sec)
Query OK, 0 rows affected, 1 warning (0,00 sec)
Query OK, 1 row affected (0,00 sec)
+---------------+-------+------+
| id | f1 | f2 |
+---------------+-------+------+
| 1677721600007 | peter | john |
+---------------+-------+------+
1 row in set (0,00 sec)
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| total | 1 |
| total_found | 1 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =peter |
| docs[0] | 1 |
| hits[0] | 2 |
| keyword[1] | peter |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+--------+
10 rows in set (0,00 sec)
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all'; replace into t(f1, f2) values ('peter', 'john'); SELECT * FROM t where match ('@f2 peter'); show meta;
Query OK, 0 rows affected (0,01 sec)
Query OK, 0 rows affected (0,00 sec)
Query OK, 1 row affected (0,00 sec)
Empty set (0,00 sec)
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | peter |
| docs[0] | 1 |
| hits[0] | 1 |
+----------------+-------+
7 rows in set (0,00 sec)
So, without morphology_skip_fields both builds (debug and release) behave similar and reveal no issue.
However on debug build behavior of morphology_skip_fields differs from release build. That looks like a bug, but not a one reported in the title. I'll try to localize it (usually such difference means that some 'assert' in the code left in wrong place and influent to behaviour depended on type of build.
➤ Aleksey N. Vinogradov commented:
here is guilty code snippet
if ( m_bIndexExactWords && eMorph!=SPH_TOKEN_MORPH_GUESS )
{
auto iBytes = strlen ( (const char*)sWord );
memcpy ( sBuf + 1, sWord, iBytes );
sBuf[0] = MAGIC_WORD_HEAD_NONSTEMMED;
sBuf[iBytes+1] = '%CODE_BLOCK%';
}
if ( m_bIndexExactWords && ( eMorph==SPH_TOKEN_MORPH_ORIGINAL || bMorphDisabled ) )
{
// can not use GetWordID here due to exception vs missed hit, ie
// stemmed sWord hasn't got added to hit stream but might be added as exception to dictionary
// that causes error at hit sorting phase \ dictionary HitblockPatch
if ( !m_pDict->GetSettings().m_bStopwordsUnstemmed )
m_pDict->ApplyStemmers ( sWord );
if ( !m_pDict->IsStopWord ( sWord ) )
m_tHits.Add ( { tRowID, m_pDict->GetWordIDNonStemmed ( sBuf ), m_tState.m_iHitPos } );
m_tState.m_iBuildLastStep = m_pTokenizer->TokenIsBlended() ? 0 : 1;
continue;
}
Notice - if we have SPH_TOKEN_MORPH_GUESS - then sBuf is NOT initialized, and contains something strange (garbage). But if bMorphDisabled - it will be injected into dictionary despite it is garbage. And that is the root of the problem.
Works fine now:
➜ ~ mysql -P9306 -h0 -v
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 6.0.5 da4dc6bdb@230612 dev (columnar 2.0.5 f1a9dd4@230612) (secondary 2.0.5 f1a9dd4@230612) git branch master...origin/master
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Reading history-file /Users/sn/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> drop table if exists t; create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'; replace into t(f1, f2) values ('abc', 'def'); SELECT * FROM t where match ('@f2 abc'); show meta;
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.02 sec)
--------------
create table t (f1 text, f2 text) morphology='lemmatize_de_all' morphology_skip_fields='f2'
--------------
Query OK, 0 rows affected, 1 warning (0.00 sec)
--------------
replace into t(f1, f2) values ('abc', 'def')
--------------
Query OK, 1 row affected (0.00 sec)
--------------
SELECT * FROM t where match ('@f2 abc')
--------------
Empty set (0.01 sec)
--------------
show meta
--------------
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| total | 0 |
| total_found | 0 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | =abc |
| docs[0] | 1 |
| hits[0] | 1 |
| keyword[1] | abc |
| docs[1] | 1 |
| hits[1] | 1 |
+----------------+-------+
10 rows in set (0.00 sec)
Below you can see that the query
@f2 abc
finds the document withf2 = def
which is wrong:Notes:
morphology_skip_fields='f2'
eliminates the issuelemmatize_de_all
tolemmatize_de
eliminates the issue