manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.89k stars 492 forks source link

Faceted Sorting Doesn't Work in Manticore with MYSQL interface #2571

Open mohdmsl opened 2 days ago

mohdmsl commented 2 days ago

Bug Description:

When attempting to run a query with facet search combined with an ORDER BY clause, the query results in a timeout when using mysql interface

SELECT id FROM test 
LIMIT 0 
OPTION max_matches=10000 
FACET org_entity_minsky_ids 
ORDER BY COUNT(*)
LIMIT 10000;

but same works when I use http json as below: curl command:


curl --location 'localhost:9308/search' \
--header 'Content-Type: application/json' \
--data '    {
     "index" : "lisdocument",
     "limit": 0,
     "aggs" :
     {
        "group name" :
         {
            "terms" :
             {
              "field":"org_entity_minsky_ids",
              "size": 10
             },
             "sort": [ {"count(*)": { "order":"desc" }} ]
         }
     }
    }'

output:

{"took":1515,"timed_out":false,"hits":{"total":1277769,"total_relation":"eq","hits":[]},"aggregations":{"group name":{"buckets":[{"key":19852393,"doc_count":34680},{"key":19867209,"doc_count":18500},{"key":217988071,"doc_count":17854},{"key":19847761,"doc_count":17326},{"key":218719469,"doc_count":17125},{"key":19845693,"doc_count":15082},{"key":19859698,"doc_count":13238},{"key":19864084,"doc_count":12979},{"key":19865529,"doc_count":12686},{"key":19876241,"doc_count":12002}]}}}

Expected Behavior: The query should return faceted results sorted as per the ORDER BY COUNT(*) clause for all modes i.e MYSQL, http

Actual Behavior: The query runs forever and timesout

Manticore Search Version:

6.3.6

Operating System Version:

linux

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] [Changelog](https://docs.google.com/spreadsheets/d/1mz_3dRWKs86FjRF7EIZUziUDK_2Hvhd97G0pLpxo05s/edit?pli=1&gid=1102439133) updated
tomatolog commented 1 day ago

it could be better to upload index data that reproduces this case locally. You could upload your data as described in the manual

mohdmsl commented 1 day ago

Adding MRE:

CREATE TABLE test (
    id string,
    type string,
    publish_year string,
    model string
    ) 
   stopwords = 'en' morphology = 'lemmatize_en_all, libstemmer_en' html_strip = '1' min_prefix_len='3' min_prefix_len = '3' stopwords_unstemmed = '1' index_exact_words='1' blend_chars='+,&' rt_mem_limit='2 * 1073741824' engine='columnar';

Data to insert

insert into test values (1, 'phone', '2013', 'M1'),
(2, 'phone', '2014', 'M2'),
(3, 'phone', '2015', 'M3'),
(4, 'tablet', '2014', 'M1'),
 (5, 'pc', '2014', 'M2'),
(6, 'tablet', '2016', 'M3'),
(7, 'pc', '2015', 'M3'),
(8, 'pc', '2017', 'M3')

QUERY:

SELECT * from test limit 0
facet type order by COUNT(*) 

The query here runs forever

sanikolaev commented 1 day ago

Reproduced on dev2 with one of recent commits and this one-liner:

mysql> drop table if exists t; create table t(i int); insert into t values(1,1); SELECT id FROM t FACET i ORDER BY COUNT(*);
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(i int)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1,1)
--------------

Query OK, 1 row affected (0.01 sec)

--------------
SELECT id FROM t FACET i ORDER BY COUNT(*)
--------------

Empty set (0.00 sec)
--- 0 out of 0 results in 0ms ---

# hangs here

The w/a is to specify asc or desc after ORDER BY COUNT(*), e.g.:

SELECT id FROM t FACET i ORDER BY COUNT(*) asc
--------------

+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

+------+----------+
| i    | count(*) |
+------+----------+
|    1 |        1 |
+------+----------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---
mohdmsl commented 1 day ago

Thanks @sanikolaev understood

mohdmsl commented 1 day ago

It would be beneficial if the default faceted results were sorted in descending order based on their count. As in most used cases we want to see top n results