pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.28k stars 5.84k forks source link

Inconsistent date_sub/date_add with MySQL when the type of first argument is float #56339

Open gengliqi opened 1 month ago

gengliqi commented 1 month ago

Bug Report

For cast_float_as_datetime, TiDB converts the float to string first and then converts the string to datetime. For date_sub/date_add part, TiDB converts the integer part of the float to datetime first and then converts the frac part to fsp. The bug is that TiDB forgets to round the frac part. TiKV does the round for the frac part so the bug exists only in TiDB.(https://github.com/tikv/tikv/blob/c67e7c37786f9d153ede85a1c8daacebe2610688/components/tidb_query_datatype/src/codec/mysql/time/mod.rs#L786)

TiDB

mysql> create table t(a double);
Query OK, 0 rows affected (0.06 sec)
mysql> insert into t values(121212131313.99998), (20000102030405.0078125);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> select a, date_add(a, interval 1 minute) from t;
+--------------------+--------------------------------+
| a                  | date_add(a, interval 1 minute) |
+--------------------+--------------------------------+
| 121212131313.99998 | 2012-12-12 13:14:13.999984     |
| 20000102030405.008 | 2000-01-02 03:05:05.007812     |
+--------------------+--------------------------------+
2 rows in set (0.00 sec)

MySQL

mysql> create table t(a double);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t values(121212131313.99998), (20000102030405.0078125);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select a, date_add(a, interval 1 minute) from t;
+--------------------+--------------------------------+
| a                  | date_add(a, interval 1 minute) |
+--------------------+--------------------------------+
| 121212131313.99998 | 2012-12-12 13:14:13.999985     |
| 20000102030405.008 | 2000-01-02 03:05:05.007813     |
+--------------------+--------------------------------+
2 rows in set (0.01 sec)
gengliqi commented 1 month ago

/severity minor

mjonss commented 1 month ago

@gengliqi I'm not sure this is even a bug, since it is also converting between a base 2 binary 64 bit float, which is about 16 decimal digits max in significance. So extending it above 16 digits and then truncate or round when there is more than 16 digits does not make sense to me, since those numbers are already only interpreted instead of actually given by the user.

I would say that it should see how many significant digits it can use and then set the FSP accordingly, including having just enough digits to convert back deterministically to a float64 again.

For reference: https://en.wikipedia.org/wiki/Double-precision_floating-point_format

An example from MySQL 8.0.32:

mysql> create table t (f64 double, deci decimal(25,8), df datetime, dd datetime, df6 datetime(6), dd6 datetime(6));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t (f64,deci) values
    -> (100102150405.67898765,100102150405.67898765),
    -> (20231002150405.67898765,20231002150405.67898765),
    -> (99991231235958.67898765,99991231235958.67898765);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> update t set df = f64, dd = deci, df6 = f64, dd6 = deci;
Query OK, 3 rows affected (0.01 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> select * from t;
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
| f64                | deci                    | df                  | dd                  | df6                        | dd6                        |
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
| 100102150405.67899 |   100102150405.67898765 | 2010-01-02 15:04:06 | 2010-01-02 15:04:06 | 2010-01-02 15:04:05.678986 | 2010-01-02 15:04:05.678988 |
|  20231002150405.68 | 20231002150405.67898765 | 2023-10-02 15:04:06 | 2023-10-02 15:04:06 | 2023-10-02 15:04:05.679688 | 2023-10-02 15:04:05.678988 |
|  99991231235958.67 | 99991231235958.67898765 | 9999-12-31 23:59:59 | 9999-12-31 23:59:59 | 9999-12-31 23:59:58.671875 | 9999-12-31 23:59:58.678988 |
+--------------------+-------------------------+---------------------+---------------------+----------------------------+----------------------------+
3 rows in set (0.01 sec)

If you think it is still worth changing, I'm also proposing #56442 which also should handle this case :)

gengliqi commented 1 month ago

@mjonss I understand your point. It really doesn't seem to make sense. But my point is just to keep TiDB consistent with TiKV and MySQL behavior. The corresponding code in TiKV already has the rounding behavior: https://github.com/tikv/tikv/blob/bf7c2ffc198a06d9910622bf6dbbf2ba8441deed/components/tidb_query_datatype/src/codec/mysql/time/mod.rs#L786 The corresponding code from MySQL 8.4.2: https://github.com/mysql/mysql-server/blob/mysql-cluster-8.4.2/mysys/decimal.cc#L1257

mjonss commented 1 month ago

@gengliqi OK, then I see the issue to be fixed and don't mind fixing it. Maybe verify that TiDB internally handle it the same as TiKV/TiFlash/UniStore engines would be a good idea.