Open sanikolaev opened 3 years ago
➤ Stan commented:
we already have 2 variants of sphLevenshtein function for sbcs
and utf8
and at Expr_Levenshtein_c uses sbcs
variant
We could switch to utf8
sphLevenshtein function and convert both arguments into utf8
format (as one of argument could be a string attribute and we can not select appropriate sbsc
or utf8
variant prior to calling this expression with actual data). utf8
sphLevenshtein will work well with either utf8
source data or sbsc
source data.
However conversion of incoming data into utf8
could slow down the expression evalution.
However sphLevenshtein is itself is not fast and have option for early out that is why additional utf8
conversion could be insignificant or we could add a new option for force sphLevenshtein variant, like `SELECT LEVENSHTEIN(title, j.name, {normalize=1, source_data='sbsc'}) AS dist, ...```
У вас и suggest не работает с многобатовыми кодировками
У вас и suggest не работает с многобатовыми кодировками
Нужен конкретный пример и отдельное issue. У меня работает:
mysql> drop table if exists t; create table t(f text) charset_table='cjk,non_cjk' min_infix_len='2'; insert into t values(0,'比较苹果和橙子'); call suggest('比较苹和橙子','t');
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.28 sec)
--------------
create table t(f text) charset_table='cjk,non_cjk' min_infix_len='2'
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
insert into t values(0,'比较苹果和橙子')
--------------
Query OK, 1 row affected (0.00 sec)
--------------
call suggest('比较苹和橙子','t')
--------------
+-----------------------+----------+------+
| suggest | distance | docs |
+-----------------------+----------+------+
| 比较苹果和橙子 | 1 | 1 |
+-----------------------+----------+------+
1 row in set (0.00 sec)
levenshtein() seems to be not multibyte safe:
It's not uncommon, e.g. in php it works similarly, but in Manticore as a database with rich full-text capabilities it makes sense to make it multibyte safe.
Related thread on forum https://forum.manticoresearch.com/t/levenshtein/878