Open pavelnemirovsky opened 1 year ago
can't reproduce
MySQL [(none)]> create table t(a bigint);insert into t values(0,1690855066);
ERROR 1064 (42000): table 't': CREATE TABLE failed: table 't' already exists
Query OK, 1 row affected (0.106 sec)
MySQL [(none)]> select min(a) min2,yearmonthday(min2) from t;
+------------+--------------------+
| min2 | yearmonthday(min2) |
+------------+--------------------+
| 1690855066 | 20230801 |
+------------+--------------------+
1 row in set (0.111 sec)
MySQL [(none)]> select min(a) min,yearmonthday(min) from t;
+------------+-------------------+
| min | yearmonthday(min) |
+------------+-------------------+
| 1690855066 | 20230801 |
+------------+-------------------+
1 row in set (0.105 sec)
could you provide the same queries but with the raw publish_date
like
SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
select publish_date, yearmonth(publish_date) as date, count(*) from fgi_prod group by date order by date desc;
I also see the index check error that could affect results
checking columnar storage...
FAILED,
Minmax presence flag out of bounds: 4
@tomatolog
mysql> SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
+--------------+------------+------------+-------------------+-------------------+
| publish_date | min | max | yearmonthday(min) | yearmonthday(max) |
+--------------+------------+------------+-------------------+-------------------+
| 1675889820 | 1666656000 | 1690941445 | 20230207 | 20230731 |
+--------------+------------+------------+-------------------+-------------------+
1 row in set (0.09 sec)
mysql> select publish_date, yearmonth(publish_date) as date, count(*) from fgi_prod group by date order by date desc;
+--------------+--------+----------+
| publish_date | date | count(*) |
+--------------+--------+----------+
| 1690932705 | 202308 | 33908 |
| 1690761600 | 202307 | 2999994 |
| 1685577600 | 202306 | 2927954 |
| 1682926440 | 202305 | 2987228 |
| 1681632000 | 202304 | 2734869 |
| 1679097600 | 202303 | 3363770 |
| 1675889820 | 202302 | 2940136 |
| 1673232360 | 202301 | 2719023 |
| 1672382280 | 202212 | 1978496 |
| 1668988800 | 202211 | 205374 |
| 1666656000 | 202210 | 103491 |
+--------------+--------+----------+
11 rows in set (16.90 sec)
@tomatolog we always and consistently reach the state when the index becomes corrupted regardless of how many times we tried to start from scratch, but we do nothing rather then insert/replace command.... it is something unpredictable ... Is anything wrong with our config:
# Ansible managed
common {
# https://manual.manticoresearch.com/Server_settings/Common#lemmatizer_base
lemmatizer_base = /usr/share/manticore/nlp/
# https://manual.manticoresearch.com/Server_settings/Common#progressive_merge
# progressive_merge =
# https://manual.manticoresearch.com/Server_settings/Common#json_autoconv_keynames
# json_autoconv_keynames =
# https://manual.manticoresearch.com/Server_settings/Common#json_autoconv_numbers
# json_autoconv_numbers = 0
# https://manual.manticoresearch.com/Server_settings/Common#on_json_attr_error
# on_json_attr_error = ignore_attr
# plugin_dir =
}
searchd {
server_id = manticore-01.dmetrics.internal
listen = 127.0.0.1:9306:mysql
listen = 127.0.0.1:9308:http
listen = 10.0.82.16:9312
listen = 10.0.82.16:9306:mysql
listen = 10.0.82.16:9308:http
listen = 10.0.82.16:9360-9370:replication
pid_file = /var/run/manticore/searchd.pid
max_packet_size = 128M
binlog_flush = 1 # ultimate safety, low speed
query_log_format = sphinxql
mysql_version_string = 5.0.37
data_dir = /var/lib/data/manticore
binlog_path = /var/lib/data/manticore
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
attr_flush_period = 60
# Flushing RT RAM chunks each 5 min
rt_flush_period = 300
preopen_tables = 0
not_terms_only_allowed = 1
}
mysql> show create table fgi_prod\G
*************************** 1. row ***************************
Table: fgi_prod
Create Table: CREATE TABLE fgi_prod (
id bigint,
publish_date timestamp,
internal_id string attribute,
tags_id json,
tags_name json,
entities_id json,
article_body_hash text stored,
content text indexed
) min_prefix_len='3' index_exact_words='1' html_strip='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/data/manticore/fgi_prod/en' rt_mem_limit='2147483648'
1 row in set (0.00 sec)
As for:
FAILED, unable to open stopwords 'en': No such file or directory
looks like a known issue https://github.com/manticoresoftware/manticoresearch/issues/1289.
Is this your case?
Minmax presence flag out of bounds: 4
and
checking columnar storage...
FAILED,
might have been already fixed in the latest dev version - https://mnt.cr/nightly
mysql> SELECT publish_date, min(publish_date) as min, MAX(publish_date) as max, yearmonthday(min), yearmonthday(max) FROM fgi_prod;
+--------------+------------+------------+-------------------+-------------------+
| publish_date | min | max | yearmonthday(min) | yearmonthday(max) |
+--------------+------------+------------+-------------------+-------------------+
| 1675889820 | 1666656000 | 1690941445 | 20230207 | 20230731 |
+--------------+------------+------------+-------------------+-------------------+
1 row in set (0.09 sec)
looks like a bug to me. Perhaps related with passing the min value through the alias to the function.
Unable to reproduce:
mysql> select published, yearmonth(published) as date, count(*) from t group by date order by date desc;
+------------+--------+----------+
| published | date | count(*) |
+------------+--------+----------+
| 1690932705 | 202308 | 1 |
| 1690761600 | 202307 | 1 |
| 1685577600 | 202306 | 1 |
| 1682926440 | 202305 | 1 |
| 1681632000 | 202304 | 1 |
| 1679097600 | 202303 | 1 |
| 1675889820 | 202302 | 1 |
| 1673232360 | 202301 | 1 |
| 1672382280 | 202212 | 1 |
| 1668988800 | 202211 | 1 |
| 1666656000 | 202210 | 1 |
+------------+--------+----------+
11 rows in set (0.00 sec)
mysql> SELECT min(published) as min, MAX(published) as max, yearmonthday(min), yearmonthday(max) FROM t;
+------------+------------+-------------------+-------------------+
| min | max | yearmonthday(min) | yearmonthday(max) |
+------------+------------+-------------------+-------------------+
| 1666656000 | 1690932705 | 20221025 | 20230802 |
+------------+------------+-------------------+-------------------+
1 row in set (0.00 sec)
Please provide your index or a minimum reproducible example of this issue.
Describe the bug
it seems thta yearmonthday() function doesn't calculate the right value out of min(publish_date), see below:
Describe the environment:
Messages from log files: Messages from searchd.log and query.log (if applicable).
Additional context