Open windtalker opened 3 years ago
Maybe we can discuss about this in https://github.com/pingcap/tidb/issues/23098
Maybe we can discuss about this in pingcap/tidb#23098
Maybe, but I think it's kind of a feature in TiDB/Mysql, it's hard to persuade them to change the decimal behavior in TiDB
Maybe we can discuss about this in pingcap/tidb#23098
Maybe, but I think it's kind of a feature in TiDB/Mysql, it's hard to persuade them to change the decimal behavior in TiDB
MySQL 5.7's behavior is different from MySQL 8.0 either. I haven't checked the implementation, but it's probably just like what you found.
TiDB is MySQL 5.7 compatible but in this case it's behavior is the same with MySQL 8.0. By the way, https://github.com/pingcap/tidb/issues/23098 has been tagged with severity/major
, I think they would take this serious.
Maybe we can discuss about this in pingcap/tidb#23098
Maybe, but I think it's kind of a feature in TiDB/Mysql, it's hard to persuade them to change the decimal behavior in TiDB
MySQL 5.7's behavior is different from MySQL 8.0 either. I haven't checked the implementation, but it's probably just like what you found.
TiDB is MySQL 5.7 compatible but in this case it's behavior is the same with MySQL 8.0. By the way, pingcap/tidb#23098 has been tagged with
severity/major
, I think they would take this serious.
I think they are planning to be MySQL 8.0 compatible, there is a project for this.
The same sql in some other databases:
Number
type for a/b, which is a floating-scale
type, it can store number of any scale with precision limited to 38.
SQL> create table xxx as select a/b as c from test;
Table created.
SQL> desc xxx; Name Null? Type
C NUMBER
SQL> desc test; Name Null? Type
A NUMBER(3) B NUMBER(10) SQL> select a/b from test where a/b != 0;
A/B
4.1157E-08
2. DB2, the return type of `a/b` in DB2 is `Decimal(31,28)`, which is the largest precision it can support.
db2 => select a/b from test where a/b != 0;
0.0000000411571904207439468071
1 record(s) selected.
db2 => describe select a/b from test;
Column Information
Number of columns: 1
SQL type Type length Column name Name length
485 DECIMAL 31,28 1 1
3. pg: like oracle, pg also returns `numeric` type for a/b
postgres=# \d test Table "public.test" Column | Type | Collation | Nullable | Default --------+---------------+-----------+----------+--------- a | numeric(3,0) | | | b | numeric(10,0) | | |
postgres=# create table xxx as select a/b as c from test where a/b != 0; SELECT 1 postgres=# \d xxx Table "public.xxx" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c | numeric | | |
0.000000041157190420743947 (1 row)
4. spark: spark return `Decimal(14,11)` for `a/b`
0: jdbc:hive2://127.0.0.1:10508/test> desc test; +-----------+----------------+-----------+----------+--+ | col_name | data_type | nullable | comment | +-----------+----------------+-----------+----------+--+ | a | decimal(3,0) | true | NULL | | b | decimal(10,0) | true | NULL | +-----------+----------------+-----------+----------+--+ 2 rows selected (0.061 seconds) 0: jdbc:hive2://127.0.0.1:10508/test> create table xxx as select a/b as c from test; +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.516 seconds) 0: jdbc:hive2://127.0.0.1:10508/test> desc xxx; +-----------+-----------------+-----------+----------+--+ | col_name | data_type | nullable | comment | +-----------+-----------------+-----------+----------+--+ | c | decimal(14,11) | true | NULL | +-----------+-----------------+-----------+----------+--+ 1 row selected (0.053 seconds) 0: jdbc:hive2://127.0.0.1:10508/test> select a/b from test where a/b != 0; +--------------------------------------------------------+--+ | (CAST(a AS DECIMAL(10,0)) / CAST(b AS DECIMAL(10,0))) | +--------------------------------------------------------+--+ | 4.116E-8 | +--------------------------------------------------------+--+ 1 row selected (0.141 seconds)
For the above 4 systems, all of them will return 1 for query `select count(*) from t where a/b !=0`, and I think in oralce/db2/gp/spark, no matter where `a/b` is, it will return the same result with the same type(which is the type inferred in compile time).
For mysql, the behavior is different between mysql 5.7 and mysql 8.0.
Mysql 5.7:
mysql> desc t; +-------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------+------+-----+---------+-------+ | a | decimal(3,0) | YES | | NULL | | | b | decimal(10,0) | YES | | NULL | | +-------+---------------+------+-----+---------+-------+ 2 rows in set (0.00 sec)
mysql> create table xxx as select a/b from t; Query OK, 1 row affected, 1 warning (0.01 sec) Records: 1 Duplicates: 0 Warnings: 1
mysql> desc xxx; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | a/b | decimal(7,4) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 1 row in set (0.00 sec)
mysql> select a/b, a/b + 0.0000000000001 from t where a/b; Empty set (0.00 sec)
mysql> select a/b, a/b + 0.0000000000001 from t; +--------+-----------------------+ | a/b | a/b + 0.0000000000001 | +--------+-----------------------+ | 0.0000 | 0.0000000410001 | +--------+-----------------------+ 1 row in set (0.00 sec)
Mysql 8.0:
mysql> desc test; +-------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------+------+-----+---------+-------+ | a | decimal(3,0) | YES | | NULL | | | b | decimal(10,0) | YES | | NULL | | +-------+---------------+------+-----+---------+-------+ 2 rows in set (0.00 sec)
mysql> create table xxx as select a/b from test; Query OK, 1 row affected, 1 warning (0.03 sec) Records: 1 Duplicates: 0 Warnings: 1
mysql> desc xxx; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | a/b | decimal(7,4) | YES | | NULL | | +-------+--------------+------+-----+---------+-------+ 1 row in set (0.00 sec)
mysql> select a/b, a/b + 0.0000000000001 from test where a/b; +--------+-----------------------+ | a/b | a/b + 0.0000000000001 | +--------+-----------------------+ | 0.0000 | 0.0000000410001 | +--------+-----------------------+ 1 row in set (0.01 sec)
so `select count(*) from test where a/b` will return 0 in Mysql 5.7 and return 1 in Mysql 8.0, and the strangest thing in Mysql is that the result of a/b is related to its location:
* for Mysql 5.7, looks like if `a/b` is a top level expression, it returns 0.0000, if `a/b` is not a top level expression, it returns 0.000000041
* for Mysql 8.0, looks like if `a/b` returns 0 only if it is a top level expression in the final project
I think the root cause of this strange behavior is the inconsistence between the compile time type and runtime type of Decimal in Mysql.
**The question is although we want to be as compatible with mysql as possible, do we really need to inherit the strange behavior from mysql**?
https://github.com/pingcap/docs-cn/pull/5891: docs for this incompatible behavior
Some background:
So if a function returns Decimal type, the return Decimal type can be inferred based on the input types.
Consider the following example:
If we want to calculate
a/b
, since the input type isDecimal(3,0)
andDecimal(10,0)
, we can easily get the return type ofa/b
based on the type infer rules. In fact, both TiDB and TiFlash useDecimal(7,4)
as its result type.Everything works fine until we found in TiDB/Mysql runtime, looks like the compile time inferred result type is only guaranteed for the top level project expressions:
From the query result of TiDB + TiKV, we can find:
a/b
in the filter condition is not aDecimal(7,4)
a/b
ina/b + 0.0000000000001
is notDecimal(7,4)
, from the result0.0000000410001
, we can infer thata/b
's result is0.000000041
, so the result's scale is at least 9.However, in TiFlash runtime, it guarantees that every function's return value is exactly the same type as the compile time inferred type, that is why query on TiDB + TiFlash returns empty result.
Luckily, it is kind of corner case because this bug will not be triggered in most scenarios except Decimal divide, but we still need to find a way to fix it in the future.