manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.07k stars 509 forks source link

Manticore query cache fails to cache PACKEDFACTORS() #2742

Closed vytautasvers closed 3 days ago

vytautasvers commented 1 week ago

Bug Description:

Recently tried to enable Manticore’s native query cache, however we ran into a situation regarding caching PACKEDFACTORS() output, regardless if it’s json or not. On a cache miss we receive values, on a cache hit we receive empty column, as mentioned by @sanikolaev on Community Slack (https://manticore-community.slack.com/archives/C7NSLK1NE/p1731066852754549) this could be a bug. We were able to reproduce it on several docker based instances with versions 6.0.4 and 6.3.6.

mysql> create table testrt (title TEXT);
mysql> show tables;
+--------+------+
| Index  | Type |
+--------+------+
| testrt | rt   |
+--------+------+
mysql> insert into testrt (title) VALUES ('homme');
mysql> insert into testrt (title) VALUES ('femme');
mysql> insert into testrt (title) VALUES ('enfants');
mysql> insert into testrt (title) VALUES ('mens');
mysql> insert into testrt (title) VALUES ('womens');
mysql> insert into testrt (title) VALUES ('kids');
mysql> select * from testrt;
+---------------------+---------+
| id                  | title   |
+---------------------+---------+
| 6920453278807359490 | femme   |
| 6920453278807359491 | enfants |
| 6920453278807359492 | mens    |
| 6920453278807359493 | womens  |
| 6920453278807359494 | kids    |
| 6920453278807359489 | homme   |
+---------------------+---------+
6 rows in set (0,00 sec)
mysql> show status like '%qcache%';
+-----------------------+------------+
| Counter               | Value      |
+-----------------------+------------+
| qcache_max_bytes      | 1073741824 |
| qcache_thresh_msec    | 0          |
| qcache_ttl_sec        | 3600       |
| qcache_cached_queries | 16         |
| qcache_used_bytes     | 4232       |
| qcache_hits           | 32         |
+-----------------------+------------+
6 rows in set (0,00 sec)
mysql> SELECT id, title, PACKEDFACTORS({json=1}) as `packedfactors` FROM testrt WHERE MATCH('femme') OPTION ranker=expr('sum(hit_count)');
+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                  | title | packedfactors                                                                               |
+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 6920453278807359490 | femme | {"bm25":709, "bm25a":0.85414708, "field_mask":1, "doc_word_count":1, "fields":[{"field":0, "lcs":1, "hit_count":1, "word_count":1, "tf_idf":0.46039113, "min_idf":0.46039113, "max_idf":0.46039113, "sum_idf":0.46039113, "min_hit_pos":1, "min_best_span_pos":1, "exact_hit":1, "max_window_hits":1, "min_gaps":0, "exact_order":1, "lccs":1, "wlccs":0.46039113, "atc":0.000000}], "words":[{"tf":1, "idf":0.46039113}]} |
+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)
mysql> SELECT id, title, PACKEDFACTORS({json=1}) as `packedfactors` FROM testrt WHERE MATCH('femme') OPTION ranker=expr('sum(hit_count)');
+---------------------+-------+---------------+
| id                  | title | packedfactors |
+---------------------+-------+---------------+
| 6920453278807359490 | femme |               |
+---------------------+-------+---------------+
1 row in set (0,00 sec)

Manticore Search Version:

6.3.6 593045790@24080214

Operating System Version:

Docker container manticoresearch/manticore:6.3.6

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [x] Implementation completed - [x] Tests developed - [x] Documentation updated - [x] Documentation reviewed - [x] [Changelog](https://docs.google.com/spreadsheets/d/1mz_3dRWKs86FjRF7EIZUziUDK_2Hvhd97G0pLpxo05s/edit?pli=1&gid=1102439133) updated
tomatolog commented 1 week ago

seems like a bug as query cache completely skip full text matching and ranking takes all ft related values from the cache.

It should be fixed to by pass query cache and do full text search and ranking if the query has a packedfactors

tomatolog commented 3 days ago

I've just fixed this issue at https://github.com/manticoresoftware/manticoresearch/commit/c17259eadb7a026439d12f78793f5bb349f0e5ed you need to use package from the dev repository after CI passed. Now the daemon will not put or check the query with the packedfactors at the query cache.