Closed kokosing closed 3 years ago
Another example
trino:tpch> select count(*) from x group by c;
_col0
-------
2
2
(2 rows)
trino:tpch> select count(*) from x where rand() != 52 group by c;
_col0
-------
1
1
1
1
(4 rows)
@kokosing @wendigo This is a bit challenging to do correctly.
The easiest case is if the grouping sets contain a text type then the connector can decide to prevent pushdown. This takes care of things like group by or distinct.
For cases with a global aggregation (e.g. max(c)
or count(c)
) we might need to add something to AggregateFunction
to identify if the function is case-sensitive or not. e.g. max is case-sensitive but count is not.
I've implemented a fix for the grouping sets case but for global aggregation I'm not sure about how we want to tackle this. It feels a bit "dirty" to add the concept of case-sensitivity vs not to engine representation of functions.
Note that I can blanket disallow aggregation pushdown if the input to an aggregation function is a text type but that looks more heavy handed than necessary? Or should we tackle it as a follow-up?
Another alternative is to add this logic to the aggregation function rewrite rules individually (e.g. ImplementMinMax)
if the function is case-sensitive or not.
I was under impression it is about argument type. Can make a condition on information is type is case sensitive or not? Is io.trino.plugin.jdbc.JdbcTypeHandle#caseSensitivity
hold an information here?
Argument type alone doesn't mean pushdown will result in incorrect results.
e.g. SELECT count(varchar) is safe to pushdown while SELECT max(varchar) is not.
In MySql:
Possibly affects SQL Server and MemSql too.