vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.64k stars 2.1k forks source link

Mimic MySQL Identifier Resolution #11702

Open systay opened 1 year ago

systay commented 1 year ago

Description

After working on https://github.com/vitessio/vitess/issues/11641, a comment by @nussjustin got me thinking and researching things a bit more, and my conclusion is that the current behaviour is not correct in some corner cases.

This are my findings:

WHERE ORDER BY GROUP BY HAVING Inside Aggregation
Table Column Matches Column Column Column Column Column
Alias Matches Fail Alias Alias Alias Alias
Both Match Column Alias Column Alias Column

To read this table, let's start with the first column. It differentiates between three cases:

  1. Table Column Matches - One of the columns in one of the tables listed in the FROM clause has a matching column.
  2. Alias Matches - One of the aliases introduced in the SELECT expressions matches.
  3. Ambiguous Match - Both one of the table columns and one of the aliases introduced in SELECT match the identifier.

The other columns are self explanatory, I believe, except the last one. It represents identifier resolution inside aggregation functions, which is not valid in WHERE. Here the resolution changes and it behaves just like GROUP BY does, no matter where the aggregation function was used.

This is not the current behaviour with the new planner, and this needs to be fixed.

systay commented 1 year ago

FYI - I used the explain select...;show warnings; trick to see what mysql rewrites queries to.

systay commented 1 year ago

Here is a list of the queries used to show the above table:

# WHERE when the table column matches
select val1 as foo from aggr_test where val1 = 'a'
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
where (`test`.`aggr_test`.`val1` = 'a')

# WHERE when both the alias and the table column matches
select val1+1 as val1 from aggr_test where val1 = 'a'
# Is rewritten to:
select (`test`.`aggr_test`.`val1` + 1) AS `val1` 
from `test`.`aggr_test` 
where (`test`.`aggr_test`.`val1` = 'a')

# WHERE when only the alias matches, and using aggregation both fail
select val1 as foo from aggr_test where foo = 'a' #this query fails
select val1 as foo from aggr_test where min(val1) = 'a' #this query fails as well

# ORDER BY when the column name matches
select val1 as foo from aggr_test order by val1
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
order by `test`.`aggr_test`.`val1`

# ORDER BY when the alias matches
select val1 as foo from aggr_test order by foo
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
order by `foo`

# ORDER BY when both match
select val2 as val1 from aggr_test order by val1
# Is rewritten to:
select `test`.`aggr_test`.`val2` AS `val1` from `test`.`aggr_test` order by `val1`

select min(val2) as val1 from aggr_test group by aggr_test.val1 order by val1
# Is rewritten to:
select min(`test`.`aggr_test`.`val2`) AS `val1` 
from `test`.`aggr_test` 
group by `test`.`aggr_test`.`val1`
order by `val1`

select val1 as foo from aggr_test group by val1
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
group by `test`.`aggr_test`.`val1`

select val1 as foo from aggr_test group by foo
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test`
group by `foo`

select val2 as val1 from aggr_test group by val1
Warning `Column 'val1' in group statement is ambiguous`
# Is rewritten to:
select `test`.`aggr_test`.`val2` AS `val1` 
from `test`.`aggr_test` 
group by `test`.`aggr_test`.`val1`

select val1 as foo from aggr_test having val1 = 'd'
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
having (`test`.`aggr_test`.`val1` = 'd')

select val1 as foo from aggr_test having foo = 'd'
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `foo` 
from `test`.`aggr_test` 
having (`foo` = 'd')

select val2 as val1 from aggr_test having val1 = 'd'
# Is rewritten to:
select `test`.`aggr_test`.`val2` AS `val1` 
from `test`.`aggr_test` 
having (`val1` = 0)

select val1, val2 from aggr_test group by val2 having min(val1) = 'd'
# Is rewritten to:
select `test`.`aggr_test`.`val1` AS `val1`,`test`.`aggr_test`.`val2` AS `val2` 
from `test`.`aggr_test` 
group by `test`.`aggr_test`.`val2` 
having (min(`test`.`aggr_test`.`val1`) = 'd')

The schema for this was very simple:

create table aggr_test
(
    id   bigint,
    val1 varchar(16),
    val2 bigint,
    primary key (id)
) Engine = InnoDB;