sqlfluff / sqlfluff

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
https://www.sqlfluff.com
MIT License
7.76k stars 708 forks source link

AL09 - Case sensitivity with unquoted identifiers: TSQL #5636

Open magosb opened 7 months ago

magosb commented 7 months ago

Search before asking

What Happened

When using sql fluff fix it rewrites our SQL queries and removes "self aliasing" even though the casing is different and even though this is nescessary.

Expected Behaviour

I do not want "ID" to be interpreted as equal to "id" or "Id".

Observed Behaviour

The following is changed by sqlfluff fix:

SELECT
    id AS  ID,
    date AS Date,
    description AS Description,
FROM
[...]

into:

SELECT
    id,
    date,
    description,
FROM
[...]

How to reproduce

Use the above SQL query and run sqlfluff lint or fix on it

Dialect

tsql

Version

Configuration

[tool.sqlfluff.core] large_file_skip_byte_limit = 100000 dialect = "tsql" ignore = "templating"

[tool.sqlfluff.indentation] tab_space_size = "4" indent_unit = "tab"

Are you willing to work on and submit a PR to address the issue?

Code of Conduct

WittierDinosaur commented 7 months ago

So are different cases actually different column identifiers in tsql?

dkushnir-picket commented 6 months ago

In Snowflake quoted aliases are different column identifiers, however lint throw AL09, for ex. "ORDER" as "order"

PieterBos94 commented 3 months ago

I also have this issue with BigQuery/DBT. The documentation mentions that it should not trigger on case sensitivity changes.

Query example: Employee_ID AS employee_id turns into Employee_ID

Before comparing it uppers both the original identifier and the alias. column_identifier.raw_upper == alias_identifier.raw_upper Text mentioning that this should not be the case: Note that this rule does allow self-alias to change case sensitivity.

The fix could be as simple as removing the upper statements, but I have no experience contributing here so not sure if I should open an PR.

aayushr7 commented 2 months ago

Hi @PieterBos94 , for your example:

Query example: Employee_ID AS employee_id turns into Employee_ID

it is the expected behaviour to convert it to Employee_ID. In BigQuery you can query the column Employee_ID both as Employee_ID and employee_id in your select statements.

Query 1:

SELECT Employee_ID FROM table

Query 2:

SELECT employee_id FROM table

If your BigQuery table column is Employee_ID, then both Query 1 and Query 2 work to read the column.

If you would like to re-case your column, it can be done using the following:

SELECT Employee_ID AS `employee_id`
FROM table
aayushr7 commented 2 months ago

Hi @dkushnir-picket , could you please share an example where this fails in Snowflake?

The rule is expected to take care of renaming the column name from Case-Sensitive to Case-Insensitive and vice versa.

The rule passes the following query (test_alias_query.sql) :

SELECT "ORDER" AS "order"
FROM
    Table1

sqlfluff lint test_alias_query.sql --dialect snowflake

dkushnir-picket commented 2 months ago

I ran the same and get L: 1 | P: 8 | AL09 | Column should not be self-aliased.

Here is my sqlflaff config

[sqlfluff]
dialect = snowflake
templater = dbt
max_line_length = 120
large_file_skip_byte_limit = 0
exclude_rules = ST06, RF04

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower
[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = lower
[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.types]
extended_capitalisation_policy = lower
aayushr7 commented 2 months ago

Hi @magosb, the Observed Behaviour of

SELECT
    id AS  ID,
    date AS Date
FROM [...]

being converted to

SELECT
    id,
    date
FROM [...]

is expected since id/ID and date / Date, etc. refer to the same column in the table while querying. Therefore, as per the rule, your query is aliasing the column to itself.

In order to have your desired output, you can directly write the query as:

SELECT 
    ID,
    Date,
    Description
FROM [...]

or in the following way using quoted identifiers:

SELECT
   id AS "ID",
   date AS "Date",
   description AS "Description"
FROM [...]

Please share if it is different in TSQL

aayushr7 commented 2 months ago

Thanks @dkushnir-picket It seems I was checking with some old dbt templater.

This is indeed an issue.

SELECT "ORDER" AS "order"
FROM
    Table1

The rule doesn't handle the case when both the column and alias are quoted.

alanmcruickshank commented 3 weeks ago

@magosb (and any other TSQL users here), would you be able to confirm that in your case your SQL Sever instance has a case-sensitive collation applied for identifiers? It appears that in the docs (https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers?view=sql-server-ver16) this is something that is configurable for TSQL, and I'm not sure what the default is.

If your default identifier collation is case sensitive, I can totally understand that your example query is fine as-is:

SELECT
    id AS  ID,
    date AS Date,
    description AS Description,
FROM my_table

What I'm particularly curious about is if we have any TSQL users here that have a case-insensitive collation configured, whether the same query is still useful to change the case of the resulting column in the result set (i.e. ID and id refer to the same column, but that id as ID is still useful so that the column appears as ID in the result, rather than id).

Thanks in advance, and sorry this is taking a while. I'm discovering that the treatment of identifier case in different SQL dialects can be vastly different.