manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.97k stars 498 forks source link

Difference between count(*) and number of results from select * from .... #668

Open ivanghisleni opened 2 years ago

ivanghisleni commented 2 years ago

Hi,

I have the following weird situation, if I made this query with count(*) the result of the count is 1372

image

but if I made the query with the same parameters the number of the elements are 1159

image

but if I add sequence_number > 1158 the result is 213 elements

image

so 213 1159=1372, the sequence_number is ordered from 0 to 1371

I'm currently running Manticore 3.6.0, and the total records are 5485931

image

here my manticore.conf


index rt1  
{  
        type                                                    = rt  
        min_infix_len                                   = 3  

.  
.  
.  
        html_remove_elements                    = style, script, embed, video, audio, img, canvas, svg, meta  
        html_strip                                              = 1  
        access_plain_attrs = mmap  
        path                                                    = /..../sphinx/rt1/sphinx  
        rt_mem_limit                                    = 512M  
}  

searchd  
{  
        listen                                                  = 127.0.0.1:9312  
        listen                                                  = 127.0.0.1:9306:mysql41  
        log                                                             = /var/log/manticore/searchd.log  
        network_timeout                                 = 60  
        persistent_connections_limit    = 150  
        pid_file                                                        = /var/run/manticore/searchd.pid  
        seamless_rotate                                 = 1  
        preopen_indexes                                 = 1  
        unlink_old                                              = 1  
        workers                                                 = threads  
        binlog_path                                             = /.../sphinx/binlog/  
        binlog_flush                                            = 2  
        max_open_files                                  = 50000  
}  

common  
{  
        plugin_dir                                              = /usr/local/sphinx/lib  
}  

index rtall  
{  
        type                                                    = distributed  
        local                                                   = rt1  
}  

What is the problem?

barryhunter commented 2 years ago

I have to wonder if 10B for max_matches, is simply way too big, such that it being ignored. (or worse integer overflowing/wrapping!)

(setting max_matches, could result in searchd trying allocate 80Gb of memory! assuming a conserrvative 8 bytes per document for just holding the 64bit id, in reality its propbably more as it has to allocate storage for attriubtes, at least the order by attribute anyway)

Try setting a more realistic value like 2000, or even say 10,000 - if you want it higher than default 1000.

sanikolaev commented 2 years ago

If the lower max_matches doesn't help try select id from ... where ... option max_matches=1; show meta;. In theory, it may be also some ids duplication issue.

ivanghisleni commented 2 years ago

I updated query as you suggested by with no differences

image

and with changing zip_name condition I obtain always 1159 records

image

also with no conditions I obtain 1159 records

image

I also launched and Optimization of index with max_chunks = 2, I stop/started manticore, but nothing changes.

What should be and why returns always?

sanikolaev commented 2 years ago

Can you send us the indexes/config/sample queries (as text) to our write-only ftp? See https://mnt.cr/ftp

ivanghisleni commented 2 years ago

@sanikolaev data uploaded, let me know if you need more information.

ivanghisleni commented 2 years ago

@sanikolaev Any updates?

sanikolaev commented 2 years ago

Hi @ivanghisleni

It indeed looks wrong:

1159 docs in show index status:

mysql> show index rt1 status;
+-----------------------------+---------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                   |
+-----------------------------+---------------------------------------------------------------------------------------------------------+
| index_type                  | rt                                                                                                      |
| indexed_documents           | 1159                                                                                                    |
| indexed_bytes               | 62803273811                                                                                             |
...
| killed_documents            | 0                                                                                                       |
...

but 5.4M in select count:

mysql> select count(*) from rt1;
+----------+
| count(*) |
+----------+
|  5485931 |
+----------+
1 row in set (0.12 sec)

No dulplicates detected by indextool --check:

snikolaev@dev:/home/manticorebugs/ftproot/github-issue-668/github-issue-668$ sudo indextool -c sphinx.conf --check rt1 --check-id-dups
Manticore 4.2.0 15e927b@211223 release (columnar 1.11.4 327b3d4@211223)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)

using config file 'sphinx.conf'...
WARNING: key 'workers' is deprecated in sphinx.conf line 90; use 'default value' instead.
checking index 'rt1'...
WARNING: failed to load RAM chunks, checking only 1 disk chunks
checking schema...
checking disk chunk, extension 129, 0(1)...
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking dead row map...
checking doc-id lookup...
checking docid douplicates ...
check passed, 109.4 sec elapsed
check passed, 109.4 sec elapsed

Looks like the millions of documents have ids and are filterable:

mysql> select count(*) from rt1 where id > 1604609627439935;
+----------+
| count(*) |
+----------+
|  2478042 |
+----------+
1 row in set (0.10 sec)

and the docs really exist:

mysql> select deleted, count(distinct id) from rt1 group by deleted;
+---------+--------------------+
| deleted | count(distinct id) |
+---------+--------------------+
|       1 |            5485931 |
+---------+--------------------+
1 row in set (0.46 sec)

but I can't fetch more than 1159 docs:

mysql> select * from rt1 where deleted = 1 limit 2000 option max_matches=2000;
...
1159 rows in set (0.15 sec)

Searching through full-text only also finds millions of docs:

mysql> select * from rt1 where match('a') limit 0; show meta;
Empty set (0.30 sec)

+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 1        |
| total_found   | 3150661  |

The suspicious thing is:

        index_token_filter                              = archiver_sphinx_integration.so:untokenize_dot_at:

Not sure if it has anything to do with the issue, but I'd try to disable it completely, rebuild the index from scratch and see if it helps.

ivanghisleni commented 2 years ago

@sanikolaev if it can be useful I can add the code about archiver_sphinx_integration.so:untokenize_dot_at My intention is to understand why this condition happen and, if possible, avoid it in the future without rebuild index.

sanikolaev commented 2 years ago

Let's try. Can you then share the plugin code and examples of the queries you use to populate the index?

ivanghisleni commented 2 years ago

I uploaded into ftp space these files:

ivanghisleni commented 2 years ago

Maybe can be useful, the issue happens after 5M deleted records (one by one), every time a record id deleted the script check on count by zip_name. I also uploaded a sample how record can be deleted: delete_records_sample.php

ivanghisleni commented 2 years ago

@sanikolaev did you find something could explain the issue?

sanikolaev commented 2 years ago

Hi. Unfortunately not. Didn't have much time this week. Hopefully will be able to look into the issue next week.

ivanghisleni commented 2 years ago

ok, I'll wait for your feedback.

sanikolaev commented 2 years ago

@ivanghisleni from your insert query example I can't understand what the actual queries are. Can you please provide few examples in SQL format, so I can use it to generate few million docs index.

ivanghisleni commented 2 years ago

@sanikolaev I uploaded a sql sample of insert query

sanikolaev commented 2 years ago

I need sample values too, not only fields list. At least just one real query.

ivanghisleni commented 2 years ago

Uploaded a real sample insert_records_sample.sql

sanikolaev commented 2 years ago

I've created an RT index based on the sample record and couldn't reproduce the issue:

mysql> select count(*) from rt1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (0.07 sec)

mysql> show index rt1 status;
+-----------------------------+---------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                   |
+-----------------------------+---------------------------------------------------------------------------------------------------------+
| index_type                  | rt                                                                                                      |
| indexed_documents           | 5000000                                                                                                 |
| indexed_bytes               | 3335000000                                                                                              |
| ram_bytes                   | 1198000463                                                                                              |
| disk_bytes                  | 2747357127                                                                                              |
| disk_mapped                 | 1110670364                                                                                              |
| disk_mapped_cached          | 688775168                                                                                               |
| disk_mapped_doclists        | 0                                                                                                       |
| disk_mapped_cached_doclists | 0                                                                                                       |
| disk_mapped_hitlists        | 0                                                                                                       |
| disk_mapped_cached_hitlists | 0                                                                                                       |
| killed_documents            | 0                                                                                                       |
| killed_rate                 | 0.00%                                                                                                   |
| ram_chunk                   | 509193751                                                                                               |
| ram_chunk_segments_count    | 31                                                                                                      |
| disk_chunks                 | 7                                                                                                       |
| mem_limit                   | 536870912                                                                                               |
| mem_limit_rate              | 95.00%                                                                                                  |
| ram_bytes_retired           | 0                                                                                                       |
| tid                         | 0                                                                                                       |
| tid_saved                   | 0                                                                                                       |
| query_time_1min             | {"queries":2, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.002, "pct95_sec":0.002, "pct99_sec":0.002}  |
| query_time_5min             | {"queries":2, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.002, "pct95_sec":0.002, "pct99_sec":0.002}  |
| query_time_15min            | {"queries":2, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.002, "pct95_sec":0.002, "pct99_sec":0.002}  |
| query_time_total            | {"queries":29, "avg_sec":0.002, "min_sec":0.000, "max_sec":0.016, "pct95_sec":0.015, "pct99_sec":0.016} |
| found_rows_1min             | {"queries":2, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |
| found_rows_5min             | {"queries":2, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |
| found_rows_15min            | {"queries":2, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                          |
| found_rows_total            | {"queries":29, "avg":5099, "min":1, "max":147851, "pct95":3697, "pct99":147851}                         |
+-----------------------------+---------------------------------------------------------------------------------------------------------+
29 rows in set (0.01 sec)
sanikolaev commented 2 years ago

@ivanghisleni would it be possible to try to reproduce the problem on your side?

sanikolaev commented 2 years ago

I mean by recreating the index from scratch.

ivanghisleni commented 2 years ago

@sanikolaev, how did you simulate the isuue?

I identified the issue after these steps:

  1. created ~10M records
  2. deleted ~5M records, one by one delete from rt1 where id = ....
  3. appeared the issue

Did you identified any anomalies on the index bug affected?

By the way, I'll try to reproduce again the issue on my side.

sanikolaev commented 2 years ago

I've tried again:

  1. created 10M (actually 10M + 1000) records
  2. deleted 5M records one by one

and all looks good to me:

mysql> show index rt1 status; select count(*) from rt1;
+-----------------------------+-----------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                     |
+-----------------------------+-----------------------------------------------------------------------------------------------------------+
| index_type                  | rt                                                                                                        |
| indexed_documents           | 5001000                                                                                                   |
| indexed_bytes               | 6670667000                                                                                                |
| ram_bytes                   | 1318392                                                                                                   |
| disk_bytes                  | 6120285122                                                                                                |
| disk_mapped                 | 2474206301                                                                                                |
| disk_mapped_cached          | 1253376                                                                                                   |
| disk_mapped_doclists        | 0                                                                                                         |
| disk_mapped_cached_doclists | 0                                                                                                         |
| disk_mapped_hitlists        | 0                                                                                                         |
| disk_mapped_cached_hitlists | 0                                                                                                         |
| killed_documents            | 4663000                                                                                                   |
| killed_rate                 | 93.24%                                                                                                    |
| ram_chunk                   | 0                                                                                                         |
| ram_chunk_segments_count    | 0                                                                                                         |
| disk_chunks                 | 15                                                                                                        |
| mem_limit                   | 536870912                                                                                                 |
| mem_limit_rate              | 95.00%                                                                                                    |
| ram_bytes_retired           | 0                                                                                                         |
| tid                         | 0                                                                                                         |
| tid_saved                   | 0                                                                                                         |
| query_time_1min             | {"queries":3, "avg_sec":0.086, "min_sec":0.039, "max_sec":0.178, "pct95_sec":0.178, "pct99_sec":0.178}    |
| query_time_5min             | {"queries":3, "avg_sec":0.086, "min_sec":0.039, "max_sec":0.178, "pct95_sec":0.178, "pct99_sec":0.178}    |
| query_time_15min            | {"queries":3, "avg_sec":0.086, "min_sec":0.039, "max_sec":0.178, "pct95_sec":0.178, "pct99_sec":0.178}    |
| query_time_total            | {"queries":149, "avg_sec":0.201, "min_sec":0.000, "max_sec":18.418, "pct95_sec":0.105, "pct99_sec":7.635} |
| found_rows_1min             | {"queries":3, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                            |
| found_rows_5min             | {"queries":3, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                            |
| found_rows_15min            | {"queries":3, "avg":1, "min":1, "max":1, "pct95":1, "pct99":1}                                            |
| found_rows_total            | {"queries":149, "avg":373323, "min":1, "max":10001000, "pct95":3697, "pct99":10001000}                    |
+-----------------------------+-----------------------------------------------------------------------------------------------------------+
29 rows in set (0.04 sec)

+----------+
| count(*) |
+----------+
|  5001000 |
+----------+
1 row in set (0.04 sec)
ivanghisleni commented 2 years ago

Didn't found any evidence from the index I gave you?

sanikolaev commented 2 years ago

Not yet. I hoped there was a way to reproduce it by recreating the index.

By the way, I'll try to reproduce again the issue on my side.

Did you manage do do it?

ivanghisleni commented 2 years ago

Yes I did, but no luck.

ivanghisleni commented 2 years ago

Any news?

sanikolaev commented 2 years ago

No. Since neither you nor we can reproduce the issue chances are the index on which the issue can be reproduced is corrupted. We are still interested in understanding what's wrong with it and why indextool --check doesn't report it's corrupted, but the priority of this issue is low since:

If the issue is mission critical for you - we do professional services.