sqlfluff / sqlfluff

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
https://www.sqlfluff.com
MIT License
7.71k stars 703 forks source link

dbt templater producing incorrect sql (adding extra line breaks, duplicating dbt_utils macro line, and giving strange indents) #6098

Open brandonpeebles opened 1 month ago

brandonpeebles commented 1 month ago

Search before asking

What Happened

I'm trying to set up sqlfluff in my dbt project and used dbt's recommended .sqlfluff file. I ran sqlfluff fix on one of my files to test and it ran successfully but produced a very strange result that will not run in Snowflake. When using the jinja templater, no changes are made.

Expected Behaviour

Below is a simplified version of my SQL file.

{{ config(
    materialized = "incremental",
    incremental_strategy="merge",
    unique_key = "surrogate_key",
    alias="entity_performance_by_day",
    enabled=true,
    tags=["daily"]
) }}

with

eligible_entities as (
    select * from {{ ref('stg_source__eligible_entity') }}
    where date_local < convert_timezone('UTC', 'America/New_York', current_timestamp)::date
    {% if is_incremental() %}
        and date_local > coalesce(
            (select max(date_local) from {{ this }}),
            dateadd('day', -1, {{ var('min_date') }})::date
        )
    {% endif %}
),

eligible_entities_rollup as (
    select
        date_local,
        entity_id,
        location_id,
        has_attribute,
        {{ dbt_utils.generate_surrogate_key(['date_local', 'entity_id', 'location_id', 'has_attribute']) }} as surrogate_key,
        count(distinct request_id) as requests,
        avg(score_metric_1) as avg_score_metric_1,
        avg(distance_metric) as avg_distance_metric,
        avg(rank_metric_1) as avg_rank_metric_1,
        avg(rank_metric_2) as avg_rank_metric_2,
        avg(rank_metric_3) as avg_rank_metric_3,
        {{ dbt.current_timestamp() }} as dbt_processed_timestamp_utc
    from eligible_entities
    group by all
)

select * from eligible_entities_rollup

When I run sqlfluff fix --dialect snowflake I'd expect it to make some minor changes with whitespace, column ordering. These are the violations it picked up on.

==== finding fixable violations ====
=== [dbt templater] Sorting Nodes...
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
== [/path/to/my/file.sql] FAIL  
L:  14 | P:  10 | LT02 | Expected line break and indent of 8 spaces before
                       | 'date_local'. [layout.indent]
L:  15 | P:   1 | LT02 | Expected indent of 8 spaces. [layout.indent]
L:  16 | P:   1 | LT02 | Expected indent of 12 spaces. [layout.indent]
L:  17 | P:   1 | LT02 | Expected indent of 16 spaces. [layout.indent]
L:  18 | P:   1 | LT02 | Expected indent of 16 spaces. [layout.indent]
L:  19 | P:   1 | LT02 | Expected indent of 12 spaces. [layout.indent]
L:  29 | P:   9 | LT05 | Line is too long (129 > 100).
                       | [layout.long_lines]
== [/path/to/my/file.sql] FIXED
7 fixable linting violations found
  [1 unfixable linting violations found]

Observed Behaviour

However, after finishing, my file then looks like this. If I try to execute this code, it obviously fails with syntax errors. Note how it:

with

eligible_entities as ( select * from {{ ref('stg_source__eligible_entity') }} where date_local < convert_timezone('UTC', 'America/New_York', current_timestamp)::date {% if is_incremental() %} and date_local > coalesce( (select max(date_local) from {{ this }}), dateadd('day', -1, {{ var('min_date') }})::date ) {% endif %} ),

eligible_entities_rollup as ( select date_local, entity_id, location_id, has_attribute,

        {{ dbt_utils.generate_surrogate_key(['date_local', 'entity_id', 'location_id', 'has_attribute']) }}

{{ dbt_utils.generate_surrogate_key(['date_local', 'entity_id', 'location_id', 'has_attribute']) }} as surrogate_key, count(distinct request_id) as requests, avg(score_metric_1) as avg_score_metric_1, avg(distance_metric) as avg_distance_metric, avg(rank_metric_1) as avg_rank_metric_1, avg(rank_metric_2) as avg_rank_metric_2, avg(rank_metric_3) as avg_rank_metric_3, {{ dbt.current_timestamp() }} as dbt_processed_timestamp_utc from eligible_entities group by all )

select * from eligible_entities_rollup


### How to reproduce

The command I ran was 
```shell
sqlfluff fix --dialect snowflake direct/path/to/my/file.sql

Dialect

Snowflake

Version

I'm on Python 3.12.5 and have these relevant packages installed:

sqlfluff==3.1.1
sqlfluff-templater-dbt==3.1.1
dbt-adapters==1.4.1
dbt-common==1.7.0
dbt-core==1.8.5
dbt-snowflake==1.8.3

Configuration

To reproduce, this is the .sqlfluff file I'm using. Basically dbt's with one extra rule exclusion.

[sqlfluff]
dialect = snowflake
# This change (from jinja to dbt templater) will make linting slower
# because linting will first compile dbt code into data warehouse code.
templater = dbt
runaway_limit = 10
max_line_length = 100
indent_unit = space
exclude_rules = CV03

[sqlfluff:templater:dbt]
project_dir = ./
profiles_dir = ../config

[sqlfluff:templater:jinja]
apply_dbt_builtins = True

[sqlfluff:indentation]
tab_space_size = 4

[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing

[sqlfluff:rules:capitalisation.keywords] 
capitalisation_policy = lower

[sqlfluff:rules:aliasing.table]
aliasing = explicit

[sqlfluff:rules:aliasing.column]
aliasing = explicit

[sqlfluff:rules:aliasing.expression]
allow_scalar = False

[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions]
capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower

[sqlfluff:rules:ambiguous.column_references]  # Number in group by
group_by_and_order_by_style = implicit

I also have this .sqlfluff ignore file

target/
# dbt <1.0.0
dbt_modules/
# dbt >=1.0.0
dbt_packages/
macros/
tests/fixtures/

Are you willing to work on and submit a PR to address the issue?

Code of Conduct

awoehrl commented 3 weeks ago

This is what we are seeing as well in our code. For now we are using --noqa wherever this happens, but it becomes tedious.