z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 45 forks source link

Lint returns different results than sqlfluff cli #44

Closed noel closed 1 year ago

noel commented 1 year ago

Given this file

with population_rank as (
    select
        country_code,
        country_name,
        value,
        s.year,
        rank() over (
            partition by country_code, country_name order by year desc
        ) as rank_years
    from {{ ref('_airbyte_raw_country_populations') }} as s
    group by 1, 2, 3
)

select
    country_code,
    country_name,
    value,
    year
from population_rank
where
    rank_years = 1
    and year > 2017

using sqlfluff 2.0.0a4 with this .sqlfluff config

[sqlfluff]
verbose = 0
nocolor = False

# Comma separated list of file extensions to lint.
# NB: This config will only apply in the root folder.
sql_file_exts = .sql,.sql.j2,.dml,.ddl

### OPINIONATED DEFAULTS ADJUST AS NECESSARY

dialect = snowflake
templater = dbt

recurse = 0
output_line_length = 80
max_line_length = 14

runaway_limit = 10
ignore = parsing,templating
ignore_templated_areas = True

# Suggested core rules as defined by the SQLFluff team
# L001 - L006
# L008 - L010
# L012 - L026
# L028
# L030
# L033
# L038 - L041
# L045 - L050
# L052
# L054

# Uncomment and adjust as needed
# rules = L001, L002, L003, L004, L005, L006, L007, L008, L009, L010, L011, L012, L013, L014, L015, L016, L017, L018, L019, L020, L021, L022, L023, L024, L025, L026, L027, L028, L029, L030, L032, L033, L036, L037, L038, L039, L040, L041, L042, L043, L044, L046, L047, L048, L049, L050, L051, L053, L054, L055, L058, L060, L061, L063, L064, L065

# Excluded rules
exclude_rules = L031, L034, L035, L051, L052, L054, L056, L057, L059, L062, L045

[sqlfluff:indentation]
tab_space_size = 4
indent_unit = space
indented_joins = False
indented_ctes = true
template_blocks_indent = True

[sqlfluff:templater]
unwrap_wrapped_queries = True

[sqlfluff:templater:dbt]
project_dir = ./

# Some rules can be configured directly from the config common to other rules.
[sqlfluff:rules]
layout:type:comma:line_position = trailing
allow_scalar = True
single_table_references = unqualified
unquoted_identifiers_policy = all

# Some rules have their own specific config.

[sqlfluff:rules:L007]  # Operator before/after new line
layout:type:binary_operator:line_position = leading

[sqlfluff:rules:L010]  # Keywords
capitalisation_policy = lower

[sqlfluff:rules:L011]
aliasing = explicit

[sqlfluff:rules:L012]
aliasing = explicit

[sqlfluff:rules:L013]
allow_scalar = True

[sqlfluff:rules:L014]
extended_capitalisation_policy = lower

[sqlfluff:rules:L016]
ignore_comment_clauses = True
ignore_comment_lines = True
indent_unit = space

[sqlfluff:rules:L019]
ayout:type:comma:line_position = trailing

[sqlfluff:rules:L026]
force_enable = True

[sqlfluff:rules:L028]
force_enable = True
single_table_references = unqualified

[sqlfluff:rules:L029]  # Keyword identifiers
unquoted_identifiers_policy = none

[sqlfluff:rules:L030]  # Function names
capitalisation_policy = lower

[sqlfluff:rules:L038]
select_clause_trailing_comma = forbid

[sqlfluff:rules:L040]  # Null & Boolean Literals
capitalisation_policy = lower

[sqlfluff:rules:L042]
# By default, allow subqueries in from clauses, but not join clauses.

forbid_subquery_in = both

[sqlfluff:rules:L047]  # Consistent syntax to count all rows
prefer_count_0 = False
prefer_count_1 = False

[sqlfluff:rules:L051]
fully_qualify_join_types = both

[sqlfluff:rules:L054]
group_by_and_order_by_style = explicit

[sqlfluff:rules:L063]
extended_capitalisation_policy = lower

[sqlfluff:rules:L064]
preferred_quoted_literal_style = single_quotes

[sqlfluff:rules:L066]
min_alias_length = 7

running this on the command line /config/.local/bin/sqlfluff lint --format json --config /config/workspace/transform/.sqlfluff transform/models/L2_bays/country_demographics/current_population.sql

outputs

22:23:30  Unable to do partial parsing because config vars, config profile, or config target have changed
22:23:34  [WARNING]: Did not find matching node for patch with name '_AIRBYTE_RAW_ZIP_COORDINATES' in the 'models' section of file 'models/L1_inlets/loans/_airbyte_raw_zip_coordinates.yml'
[{"filepath": "transform/models/L2_bays/country_demographics/current_population.sql", "violations": [{"line_no": 1, "line_pos": 1, "code": "L016", "description": "Line is too long (25 > 14)."}, {"line_no": 3, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 4, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 6, "line_pos": 9, "code": "L016", "description": "Line is too long (15 > 14)."}, {"line_no": 6, "line_pos": 9, "code": "L028", "description": "Qualified reference 's.year' found in single table select."}, {"line_no": 7, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 8, "line_pos": 13, "code": "L016", "description": "Line is too long (70 > 14)."}, {"line_no": 9, "line_pos": 9, "code": "L016", "description": "Line is too long (23 > 14)."}, {"line_no": 10, "line_pos": 5, "code": "L016", "description": "Line is too long (59 > 14)."}, {"line_no": 10, "line_pos": 59, "code": "L066", "description": "Aliases should be at least 7 character(s) long."}, {"line_no": 11, "line_pos": 5, "code": "L016", "description": "Line is too long (20 > 14)."}, {"line_no": 15, "line_pos": 5, "code": "L016", "description": "Line is too long (17 > 14)."}, {"line_no": 16, "line_pos": 5, "code": "L016", "description": "Line is too long (17 > 14)."}, {"line_no": 19, "line_pos": 1, "code": "L016", "description": "Line is too long (20 > 14)."}, {"line_no": 21, "line_pos": 5, "code": "L016", "description": "Line is too long (18 > 14)."}, {"line_no": 22, "line_pos": 5, "code": "L016", "description": "Line is too long (19 > 14)."}]}]

whereas, running this curl -X POST localhost:8581/lint?sql_path=/config/workspace/transform/models/L2_bays/country_demographics/current_population.sql&extra_config_path=/config/workspace/transform/.sqlfluff

returns this

{"result":[{"code":"L028","description":"Qualified reference 's.year' found in single table select.","line_no":6,"line_pos":9},{"code":"L066","description":"Aliases should be at least 7 character(s) long.","line_no":10,"line_pos":59}]}
barrywhart commented 1 year ago

It works correctly for me:

(sqlfluff-3.10.8) ➜  my-dbt git:(2.0.0a4) ✗ curl -X POST 'localhost:8581/lint?sql_path=/Users/barry/dev/sqlfluff/my-dbt/models/osmosis_issue_44.sql&extra_config_path=/Users/barry/dev/sqlfluff/my-dbt/.sqlfluff'

{"result":[{"code":"L016","description":"Line is too long (25 > 14).","line_no":1,"line_pos":1},{"code":"L016","description":"Line is too long (21 > 14).","line_no":3,"line_pos":9},{"code":"L016","description":"Line is too long (21 > 14).","line_no":4,"line_pos":9},{"code":"L016","description":"Line is too long (15 > 14).","line_no":6,"line_pos":9},{"code":"L028","description":"Qualified reference 's.year' found in single table select.","line_no":6,"line_pos":9},{"code":"L016","description":"Line is too long (21 > 14).","line_no":7,"line_pos":9},{"code":"L016","description":"Line is too long (70 > 14).","line_no":8,"line_pos":13},{"code":"L016","description":"Line is too long (23 > 14).","line_no":9,"line_pos":9},{"code":"L016","description":"Line is too long (54 > 14).","line_no":10,"line_pos":5},{"code":"L066","description":"Aliases should be at least 7 character(s) long.","line_no":10,"line_pos":54},{"code":"L016","description":"Line is too long (20 > 14).","line_no":11,"line_pos":5},{"code":"L016","description":"Line is too long (17 > 14).","line_no":15,"line_pos":5},{"code":"L016","description":"Line is too long (17 > 14).","line_no":16,"line_pos":5},{"code":"L016","description":"Line is too long (20 > 14).","line_no":19,"line_pos":1},{"code":"L016","description":"Line is too long (18 > 14).","line_no":21,"line_pos":5},{"code":"L016","description":"Line is too long (19 > 14).","line_no":22,"line_pos":5},{"code":"L009","description":"Files must end with a single trailing newline.","line_no":23,"line_pos":1}]}%                 

I suspect the issue lies somewhere in the specifics of the other .sqlfluff file(s) in the environment and how they are interacting. In order to debug this, I need a more detailed scenario to reproduce it, primarily a zipped-up directory (or directories) including:

SQLFluff is very flexible and powerful in allowing multiple .sqlfluff files, e.g. one per directory, and the files overlay in particular ways. The downside of this flexibility and power is that it's sometimes unclear which .sqlfluff files are involved when linting, and what the final combined configuration is from those files.

Also, note that where you run the sqlfluff lint command from (i.e. the current directory) affects which .sqlfluff files are used. On the other hand, the osmosis server works differently. IIRC, it lints or fixes as if the command were run from the root directory of the dbt project. Additional, temporary logging in the SQLFluff code which loads the .sqlfluff files could help understand what's happening.

Another possible approach -- delete all the .sqlfluff files on the system except the one specified as extra_config_path. With only one .sqlfluff file, there should be no possibility of confusing configuration overlay.

noel commented 1 year ago

this is due to us using sqlfluff 2.0.0a4 and osmosis being locked to < version 2.