manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.83k stars 489 forks source link

Some weird behavior of yearmonthday() function #1323

Open pavelnemirovsky opened 1 year ago

pavelnemirovsky commented 1 year ago

Describe the bug

it seems thta yearmonthday() function doesn't calculate the right value out of min(publish_date), see below:

mysql> SELECT min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
+------------+------------+-------------------+-------------------+
| min        | max        | yearmonthday(min) | yearmonthday(max) |
+------------+------------+-------------------+-------------------+
| 1666656000 | 1690855066 |          20230207 |          20230731 |
+------------+------------+-------------------+-------------------+
1 row in set (0.09 sec)

1666656000 - October 25, 2022
1690855066 -  August 1, 2023 1:57:46

mysql> select yearmonth(publish_date) as date, count(*) from fgi_prod group by date order by date desc;
+--------+----------+
| date   | count(*) |
+--------+----------+
| 202308 |     2202 |
| 202307 |  2929479 |
| 202306 |  2927954 |
| 202305 |  2987228 |
| 202304 |  2734869 |
| 202303 |  3363770 |
| 202302 |  2940136 |
| 202301 |  2719023 |
| 202212 |  1978496 |
| 202211 |   205374 |
| 202210 |   103491 |
+--------+----------+
11 rows in set (16.68 sec)

Describe the environment:

Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
Linux manticore-01.dmetrics.internal 5.4.0-1097-aws #105~18.04.1-Ubuntu SMP Mon Feb 13 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Messages from log files: Messages from searchd.log and query.log (if applicable).

Additional context

root@manticore-01:~# indextool --check fgi_prod
Manticore 6.0.4 1a3a4ea82@230314 (columnar 2.0.4 5a49bd7@230306) (secondary 2.0.4 5a49bd7@230306)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

using config file '/etc/manticoresearch/manticore.conf'...
WARNING: table fgi_prod: table 'fgi_prod': morphology option changed from config has no effect, ignoring
checking table 'fgi_prod'...
WARNING: failed to load RAM chunks, checking only 32 disk chunks
checking schema...
checking RT segment 0(29)...
checking rows...
checking dead row map...
checking RT segment 1(29)...
checking rows...
checking dead row map...
checking RT segment 2(29)...
checking rows...
checking dead row map...
checking RT segment 3(29)...
checking rows...
checking dead row map...
checking RT segment 4(29)...
checking rows...
checking dead row map...
checking RT segment 5(29)...
checking rows...
checking dead row map...
checking RT segment 6(29)...
checking rows...
checking dead row map...
checking RT segment 7(29)...
checking rows...
checking dead row map...
checking RT segment 8(29)...
checking rows...
checking dead row map...
checking RT segment 9(29)...
checking rows...
checking dead row map...
checking RT segment 10(29)...
checking rows...
checking dead row map...
checking RT segment 11(29)...
checking rows...
checking dead row map...
checking RT segment 12(29)...
checking rows...
checking dead row map...
checking RT segment 13(29)...
checking rows...
checking dead row map...
checking RT segment 14(29)...
checking rows...
checking dead row map...
checking RT segment 15(29)...
checking rows...
checking dead row map...
checking RT segment 16(29)...
checking rows...
checking dead row map...
checking RT segment 17(29)...
checking rows...
checking dead row map...
checking RT segment 18(29)...
checking rows...
checking dead row map...
checking RT segment 19(29)...
checking rows...
checking dead row map...
checking RT segment 20(29)...
checking rows...
checking dead row map...
checking RT segment 21(29)...
checking rows...
checking dead row map...
checking RT segment 22(29)...
checking rows...
checking dead row map...
checking RT segment 23(29)...
checking rows...
checking dead row map...
checking RT segment 24(29)...
checking rows...
checking dead row map...
checking RT segment 25(29)...
checking rows...
checking dead row map...
checking RT segment 26(29)...
checking rows...
checking dead row map...
checking RT segment 27(29)...
checking rows...
checking dead row map...
checking RT segment 28(29)...
checking rows...
checking dead row map...
checking disk chunk, extension 132, 0(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 2 failures reported, 29.5 sec elapsed
checking disk chunk, extension 138, 1(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 4 failures reported, 58.7 sec elapsed
checking disk chunk, extension 143, 2(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 6 failures reported, 94.9 sec elapsed
checking disk chunk, extension 146, 3(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 8 failures reported, 131.7 sec elapsed
checking disk chunk, extension 149, 4(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 10 failures reported, 169.9 sec elapsed
checking disk chunk, extension 152, 5(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 12 failures reported, 209.4 sec elapsed
checking disk chunk, extension 155, 6(32)...
FAILED, unable to open stopwords 'en': No such file or directory
checking schema...
checking dictionary...
checking data...
^C39000/0
l1t1 commented 1 year ago

can't reproduce

MySQL [(none)]> create table t(a bigint);insert into t values(0,1690855066);
ERROR 1064 (42000): table 't': CREATE TABLE failed: table 't' already exists
Query OK, 1 row affected (0.106 sec)

MySQL [(none)]> select min(a) min2,yearmonthday(min2) from t;
+------------+--------------------+
| min2       | yearmonthday(min2) |
+------------+--------------------+
| 1690855066 |           20230801 |
+------------+--------------------+
1 row in set (0.111 sec)

MySQL [(none)]> select min(a) min,yearmonthday(min) from t;
+------------+-------------------+
| min        | yearmonthday(min) |
+------------+-------------------+
| 1690855066 |          20230801 |
+------------+-------------------+
1 row in set (0.105 sec)
tomatolog commented 1 year ago

could you provide the same queries but with the raw publish_date like

SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
select publish_date, yearmonth(publish_date) as date, count(*) from fgi_prod group by date order by date desc;

I also see the index check error that could affect results

checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
pavelnemirovsky commented 1 year ago

@tomatolog

mysql> SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
+--------------+------------+------------+-------------------+-------------------+
| publish_date | min        | max        | yearmonthday(min) | yearmonthday(max) |
+--------------+------------+------------+-------------------+-------------------+
|   1675889820 | 1666656000 | 1690941445 |          20230207 |          20230731 |
+--------------+------------+------------+-------------------+-------------------+
1 row in set (0.09 sec)

mysql> select publish_date, yearmonth(publish_date) as date, count(*) from fgi_prod group by date order by date desc;

+--------------+--------+----------+
| publish_date | date   | count(*) |
+--------------+--------+----------+
|   1690932705 | 202308 |    33908 |
|   1690761600 | 202307 |  2999994 |
|   1685577600 | 202306 |  2927954 |
|   1682926440 | 202305 |  2987228 |
|   1681632000 | 202304 |  2734869 |
|   1679097600 | 202303 |  3363770 |
|   1675889820 | 202302 |  2940136 |
|   1673232360 | 202301 |  2719023 |
|   1672382280 | 202212 |  1978496 |
|   1668988800 | 202211 |   205374 |
|   1666656000 | 202210 |   103491 |
+--------------+--------+----------+
11 rows in set (16.90 sec)
pavelnemirovsky commented 1 year ago

@tomatolog we always and consistently reach the state when the index becomes corrupted regardless of how many times we tried to start from scratch, but we do nothing rather then insert/replace command.... it is something unpredictable ... Is anything wrong with our config:

# Ansible managed
common {
    # https://manual.manticoresearch.com/Server_settings/Common#lemmatizer_base
    lemmatizer_base = /usr/share/manticore/nlp/

    # https://manual.manticoresearch.com/Server_settings/Common#progressive_merge
    # progressive_merge =

    # https://manual.manticoresearch.com/Server_settings/Common#json_autoconv_keynames
    # json_autoconv_keynames =

    # https://manual.manticoresearch.com/Server_settings/Common#json_autoconv_numbers
    # json_autoconv_numbers = 0

    # https://manual.manticoresearch.com/Server_settings/Common#on_json_attr_error
    # on_json_attr_error = ignore_attr

    # plugin_dir =
}

searchd {
  server_id = manticore-01.dmetrics.internal
  listen = 127.0.0.1:9306:mysql
  listen = 127.0.0.1:9308:http
  listen = 10.0.82.16:9312
  listen = 10.0.82.16:9306:mysql
  listen = 10.0.82.16:9308:http
  listen = 10.0.82.16:9360-9370:replication
  pid_file = /var/run/manticore/searchd.pid
  max_packet_size = 128M
  binlog_flush = 1 # ultimate safety, low speed
  query_log_format = sphinxql
  mysql_version_string = 5.0.37
  data_dir = /var/lib/data/manticore
  binlog_path = /var/lib/data/manticore
  log = /var/log/manticore/searchd.log
  query_log = /var/log/manticore/query.log
  attr_flush_period = 60
  # Flushing RT RAM chunks each 5 min
  rt_flush_period = 300
  preopen_tables = 0
  not_terms_only_allowed = 1
}

mysql> show create table fgi_prod\G
*************************** 1. row ***************************
       Table: fgi_prod
Create Table: CREATE TABLE fgi_prod (
id bigint,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json,
article_body_hash text stored,
content text indexed
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/data/manticore/fgi_prod/en' rt_mem_limit='2147483648'
1 row in set (0.00 sec)
sanikolaev commented 1 year ago

As for:

FAILED, unable to open stopwords 'en': No such file or directory

looks like a known issue https://github.com/manticoresoftware/manticoresearch/issues/1289.

Is this your case?

Minmax presence flag out of bounds: 4

and

checking columnar storage...
FAILED,

might have been already fixed in the latest dev version - https://mnt.cr/nightly

mysql> SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
+--------------+------------+------------+-------------------+-------------------+
| publish_date | min        | max        | yearmonthday(min) | yearmonthday(max) |
+--------------+------------+------------+-------------------+-------------------+
|   1675889820 | 1666656000 | 1690941445 |          20230207 |          20230731 |
+--------------+------------+------------+-------------------+-------------------+
1 row in set (0.09 sec)

looks like a bug to me. Perhaps related with passing the min value through the alias to the function.

glookka commented 9 months ago

Unable to reproduce:

mysql> select published, yearmonth(published) as date, count(*) from t group by date order by date desc;
+------------+--------+----------+
| published  | date   | count(*) |
+------------+--------+----------+
| 1690932705 | 202308 |        1 |
| 1690761600 | 202307 |        1 |
| 1685577600 | 202306 |        1 |
| 1682926440 | 202305 |        1 |
| 1681632000 | 202304 |        1 |
| 1679097600 | 202303 |        1 |
| 1675889820 | 202302 |        1 |
| 1673232360 | 202301 |        1 |
| 1672382280 | 202212 |        1 |
| 1668988800 | 202211 |        1 |
| 1666656000 | 202210 |        1 |
+------------+--------+----------+
11 rows in set (0.00 sec)

mysql> SELECT min(published) as min, MAX(published) as max, yearmonthday(min), yearmonthday(max) FROM t;
+------------+------------+-------------------+-------------------+
| min        | max        | yearmonthday(min) | yearmonthday(max) |
+------------+------------+-------------------+-------------------+
| 1666656000 | 1690932705 |          20221025 |          20230802 |
+------------+------------+-------------------+-------------------+
1 row in set (0.00 sec)

Please provide your index or a minimum reproducible example of this issue.