pingcap / parser

A MySQL Compatible SQL Parser
Apache License 2.0
1.4k stars 490 forks source link

parser: fix aggregate functions compatibility to MySQL8.0 #1116

Open hidehalo opened 3 years ago

hidehalo commented 3 years ago

Bug Report

On MySQL 8, aggregate functions below do not support distinct options:

(ref https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html)

E.g:

Executed result of SQL select std(distinct c) from t in MySQL 8 would get error feedback like:

"ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'distinct c) from t' at line 1"

but, it would execute successfully in TiDB.

hidehalo commented 3 years ago

@kennytm Shall we label it type/bug?

kennytm commented 3 years ago

I don't see any point disabling support of DISTINCT. We provide the correct result with DISTINCT, compatible with PostgreSQL.

create table a (a int);
insert into a values (1), (2), (2), (3), (3), (3);
select var_pop(all a), var_pop(distinct a), var_pop(a) from a;
-- both PostgreSQL and TiDB return 0.5555…, 0.6666…, 0.5555…

The additional DISTINCT keyword, although prohibited by ISO 9075*, is well supported in PostgreSQL (and CockroachDB), BigQuery, Snowflake and Microsoft Transact-SQL and IBM DB2. Meanwhile, MySQL (and MariaDB), Oracle, SingleStore (formerly MemSQL) followed ISO 9075 and disallowed DISTINCT. SQLite3 does not have the standard deviation/variance aggregate functions.

ISO/IEC 9075-2:201? §4.16.4: Neither DISTINCT nor ALL are allowed to be specified for VAR_POP, VAR_SAMP, STDDEV_POP, or STDDEV_SAMP; redundant duplicates are not removed when computing these functions.

Edit: Oracle's non-standard VARIANCE and STDDEV functions do allow DISTINCT, but the standard ones with _POP and _SAMP suffixes don't :shrug:.

hidehalo commented 3 years ago

I don't see any point disabling support of DISTINCT. We provide the correct result with DISTINCT, compatible with PostgreSQL.

create table a (a int);
insert into a values (1), (2), (2), (3), (3), (3);
select var_pop(all a), var_pop(distinct a), var_pop(a) from a;
-- both PostgreSQL and TiDB return 0.5555…, 0.6666…, 0.5555…

The additional DISTINCT keyword, although prohibited by ISO 9075*, is well supported in PostgreSQL (and CockroachDB), BigQuery, Snowflake and Microsoft Transact-SQL and IBM DB2. Meanwhile, MySQL (and MariaDB), ~Oracle~, SingleStore (formerly MemSQL) followed ISO 9075 and disallowed DISTINCT. SQLite3 does not have the standard deviation/variance aggregate functions.

ISO/IEC 9075-2:201? §4.16.4: Neither DISTINCT nor ALL are allowed to be specified for VAR_POP, VAR_SAMP, STDDEV_POP, or STDDEV_SAMP; redundant duplicates are not removed when computing these functions.

Edit: Oracle's non-standard VARIANCE and STDDEV functions do allow DISTINCT, but the standard ones with _POP and _SAMP suffixes don't 🤷.

@kennytm Yes, I have notice that TiDB provide the correct result with DISTINCT. I am not familiar with those DB&standards, this issue come from https://github.com/pingcap/tidb/issues/19499, I think maybe it is not necessary to TiDB. This change is about function support, I am not suitable for making decisions, maybe we should close those relative issues?