Is your feature request related to a problem?
Aggregation statement, query with aggregate function, group-by clause that we've supported, grouping set, cube that we haven't supported, has many restrictions based on SQL standard. OpenSearch SQL adapts MySQL grammar as it initial grammar, but the query compilation check should be done in query resolution/analyzing. Current query compilation check is weak, especially for aggregation statement.
Here is the result comparison of query compilation check between OpenSearch SQL, Spark, Postgres and Oracle.
SELECT query_id FROM test_bulk1 GROUP BY max(query_id)
4
SELECT max(query_id) FROM test_bulk1 GROUP BY 1
5
SELECT max(query_id) FILTER(WHERE query_id) FROM test_bulk1
6
SELECT max(query_id) FILTER(WHERE max(query_id)>10) FROM test_bulk1
7
SELECT * FROM test_bulk1 WHERE ROW_NUMBER() OVER(ORDER BY name) > 0
8
SELECT name, rank() OVER (PARTITION BY a) FROM test_bulk1
Result of query compilation:
ID
Description
OS-SQL
Spark
Postgres
Oracle
1
column not in group-by clause
return null value
AnalysisException: The non-aggregating expression "query_id" is based on columns which are not participating in the GROUP BY clause.
ERROR: column "query_id" must appear in the GROUP BY clause or be used in an aggregate function
ORA-00979: "QUERY_ID": must appear in the GROUP BY clause or be used in an aggregate function
2
group-by is missing
Explicit GROUP BY clause is required because expression [query_id] contains non-aggregated column
The query does not include a GROUP BY clause. Add GROUP BY or turn it into the window functions using OVER clauses
ditto^
ORA-00937: not a single-group group function
3
aggregate functions in group-by
Server side error during query execution
Aggregate functions are not allowed in GROUP BY, but found max(query_id)
ERROR: aggregate functions are not allowed in GROUP BY
ORA-00934: group function is not allowed here
4
group-by position refers to aggregate function
Server side error during query execution
GROUP BY 1 refers to an expression that contains an aggregate function. Aggregate functions are not allowed in GROUP BY
ditto^
return 3
5
non boolean filter in aggregate
Client side error during query execution: [filter] must not be null
FILTER expression is not of type boolean. It cannot be used in an aggregate function
ERROR: argument of FILTER must be type boolean, not type integer
ORA-00923: FROM keyword not found where expected
6
aggregate in aggregate-filter error
Server side error during query execution
FILTER expression contains aggregate. It cannot be used in an aggregate function
ERROR: aggregate functions are not allowed in FILTER
ditto^
7
window function in where-clause
Server side error during query execution
It is not allowed to use window functions inside WHERE clause
ERROR: window functions are not allowed in WHERE
ORA-30483: window functions are not allowed here
8
ranking window function misses order clause
return Ava,1; Clark,1; Dave,1
Window function rank() requires window to be ordered, please add ORDER BY clause
return Ava,1; Clark,1; Dave,1
missing ORDER BY expression in the window specification
We should fix above query compilation error to make aggregation more robust.
What solution would you like?
This RFC is to address the weakness of query compilation for aggregation statement, including query with aggregate function, window function, group-by clause etc.
We will align with semantics of Spark and PostgresSQL as possible as we can. If the behaviours between Spark and PostgresSQL are different, we will choose the option which one could mitigate the impact or reduces breaking change.
What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.
Do you have any additional context?
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Aggregation statement, query with aggregate function, group-by clause that we've supported, grouping set, cube that we haven't supported, has many restrictions based on SQL standard. OpenSearch SQL adapts MySQL grammar as it initial grammar, but the query compilation check should be done in query resolution/analyzing. Current query compilation check is weak, especially for aggregation statement.
Here is the result comparison of query compilation check between OpenSearch SQL, Spark, Postgres and Oracle.
3
Ava,1; Clark,1; Dave,1
Ava,1; Clark,1; Dave,1
We should fix above query compilation error to make aggregation more robust.
What solution would you like? This RFC is to address the weakness of query compilation for aggregation statement, including query with aggregate function, window function, group-by clause etc. We will align with semantics of Spark and PostgresSQL as possible as we can. If the behaviours between Spark and PostgresSQL are different, we will choose the option which one could mitigate the impact or reduces breaking change.
What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.
Do you have any additional context? Add any other context or screenshots about the feature request here.