pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.02k stars 5.82k forks source link

Inconsistent date_sub/date_add with MySQL when the type of first argument is float #56339

Open gengliqi opened 1 week ago

gengliqi commented 1 week ago

Bug Report

For cast_float_as_datetime, TiDB converts the float to string first and then converts the string to datetime. For date_sub/date_add part, TiDB converts the integer part of the float to datetime first and then converts the frac part to fsp. The bug is that TiDB forgets to round the frac part. TiKV does the round for the frac part so the bug exists only in TiDB.(https://github.com/tikv/tikv/blob/c67e7c37786f9d153ede85a1c8daacebe2610688/components/tidb_query_datatype/src/codec/mysql/time/mod.rs#L786)

TiDB

mysql> create table t(a double);
Query OK, 0 rows affected (0.06 sec)
mysql> insert into t values(121212131313.99998), (20000102030405.0078125);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> select a, date_add(a, interval 1 minute) from t;
+--------------------+--------------------------------+
| a                  | date_add(a, interval 1 minute) |
+--------------------+--------------------------------+
| 121212131313.99998 | 2012-12-12 13:14:13.999984     |
| 20000102030405.008 | 2000-01-02 03:05:05.007812     |
+--------------------+--------------------------------+
2 rows in set (0.00 sec)

MySQL

mysql> create table t(a double);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t values(121212131313.99998), (20000102030405.0078125);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select a, date_add(a, interval 1 minute) from t;
+--------------------+--------------------------------+
| a                  | date_add(a, interval 1 minute) |
+--------------------+--------------------------------+
| 121212131313.99998 | 2012-12-12 13:14:13.999985     |
| 20000102030405.008 | 2000-01-02 03:05:05.007813     |
+--------------------+--------------------------------+
2 rows in set (0.01 sec)
gengliqi commented 1 week ago

/severity minor

mjonss commented 4 days ago

@gengliqi I'm not sure this is even a bug, since it is also converting between a base 2 binary 64 bit float, which is about 16 decimal digits max in significance. So extending it above 16 digits and then truncate or round when there is more than 16 digits does not make sense to me, since those numbers are already only interpreted instead of actually given by the user.

I would say that it should see how many significant digits it can use and then set the FSP accordingly, including having just enough digits to convert back deterministically to a float64 again.

For reference: https://en.wikipedia.org/wiki/Double-precision_floating-point_format

An example from MySQL 8.0.32:

mysql> create table t (f64 double, deci decimal(25,8), df datetime, dd datetime, df6 datetime(6), dd6 datetime(6));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t (f64,deci) values
    -> (100102150405.67898765,100102150405.67898765),
    -> (20231002150405.67898765,20231002150405.67898765),
    -> (99991231235958.67898765,99991231235958.67898765);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> update t set df = f64, dd = deci, df6 = f64, dd6 = deci;
Query OK, 3 rows affected (0.01 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> select * from t;
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
| f64                | deci                    | df                  | dd                  | df6                        | dd6                        |
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
| 100102150405.67899 |   100102150405.67898765 | 2010-01-02 15:04:06 | 2010-01-02 15:04:06 | 2010-01-02 15:04:05.678986 | 2010-01-02 15:04:05.678988 |
|  20231002150405.68 | 20231002150405.67898765 | 2023-10-02 15:04:06 | 2023-10-02 15:04:06 | 2023-10-02 15:04:05.679688 | 2023-10-02 15:04:05.678988 |
|  99991231235958.67 | 99991231235958.67898765 | 9999-12-31 23:59:59 | 9999-12-31 23:59:59 | 9999-12-31 23:59:58.671875 | 9999-12-31 23:59:58.678988 |
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
3 rows in set (0.01 sec)

If you think it is still worth changing, I'm also proposing #56442 which also should handle this case :)

gengliqi commented 51 minutes ago

@mjonss I understand your point. It really doesn't seem to make sense. But my point is just to keep TiDB consistent with TiKV and MySQL behavior. The corresponding code in TiKV already has the rounding behavior: https://github.com/tikv/tikv/blob/bf7c2ffc198a06d9910622bf6dbbf2ba8441deed/components/tidb_query_datatype/src/codec/mysql/time/mod.rs#L786 The corresponding code from MySQL 8.4.2: https://github.com/mysql/mysql-server/blob/mysql-cluster-8.4.2/mysys/decimal.cc#L1257