reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.33k stars 240 forks source link

problem on parsing tablename with tsql NOLOCK #448

Open EcoleKeine opened 1 year ago

EcoleKeine commented 1 year ago

sqllineage==1.4.7

select * from dbname.dto.tablename(NOLOCK)
print(LineageRunner(sql=sql,dialect="tsql") )

output:

Statements(#): 1
Source Tables:

Target Tables:

with(NOLOCK) is ok

select * from dbname.dto.tablename with(NOLOCK)
reata commented 1 year ago

Can you share the language manual for the first SQL statement?

I can only find corresponding docs for second statement: https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-ver16 , where Table hints syntax is defined as

WITH  ( <table_hint> [ [ , ] ...n ] )

<table_hint> ::=
{ NOEXPAND [ , INDEX ( <index_value> [ , ...n ] ) | INDEX = ( <index_value> ) ]
  | INDEX ( <index_value> [ , ...n ] ) | INDEX = ( <index_value> )
  | FORCESEEK [ ( <index_value> ( <index_column_name> [ , ... ] ) ) ]
  | FORCESCAN
  | HOLDLOCK
  | NOLOCK
  | NOWAIT
  | PAGLOCK
  | READCOMMITTED
  | READCOMMITTEDLOCK
  | READPAST
  | READUNCOMMITTED
  | REPEATABLEREAD
  | ROWLOCK
  | SERIALIZABLE
  | SNAPSHOT
  | SPATIAL_WINDOW_MAX_CELLS = <integer_value>
  | TABLOCK
  | TABLOCKX
  | UPDLOCK
  | XLOCK
}

<table_hint_limited> ::=
{
    KEEPIDENTITY
  | KEEPDEFAULTS
  | HOLDLOCK
  | IGNORE_CONSTRAINTS
  | IGNORE_TRIGGERS
  | NOLOCK
  | NOWAIT
  | PAGLOCK
  | READCOMMITTED
  | READCOMMITTEDLOCK
  | READPAST
  | REPEATABLEREAD
  | ROWLOCK
  | SERIALIZABLE
  | SNAPSHOT
  | TABLOCK
  | TABLOCKX
  | UPDLOCK
  | XLOCK
}

Looks like WITH is mandatory?

EcoleKeine commented 1 year ago

@reata Just in Arguments section of docs:

The following table hints are allowed with and without the WITH keyword: NOLOCK, READUNCOMMITTED, UPDLOCK, REPEATABLEREAD, SERIALIZABLE, READCOMMITTED, TABLOCK, TABLOCKX, PAGLOCK, ROWLOCK, NOWAIT, READPAST, XLOCK, SNAPSHOT, and NOEXPAND. When these table hints are specified without the WITH keyword, the hints should be specified alone. For example:

FROM t (TABLOCK)
reata commented 1 year ago

@EcoleKeine thanks for the info.

sqlfluff generates different AST for these two statements:

For select * from dbname.dto.tablename(NOLOCK):

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        wildcard_expression:
[L:  1, P:  8]      |                            wildcard_identifier:
[L:  1, P:  8]      |                                star:                         '*'
[L:  1, P:  9]      |                    whitespace:                               ' '
[L:  1, P: 10]      |                [META] dedent:
[L:  1, P: 10]      |                from_clause:
[L:  1, P: 10]      |                    keyword:                                  'from'
[L:  1, P: 14]      |                    whitespace:                               ' '
[L:  1, P: 15]      |                    from_expression:
[L:  1, P: 15]      |                        [META] indent:
[L:  1, P: 15]      |                        from_expression_element:
[L:  1, P: 15]      |                            table_expression:
[L:  1, P: 15]      |                                function:
[L:  1, P: 15]      |                                    function_name:
[L:  1, P: 15]      |                                        naked_identifier:     'dbname'
[L:  1, P: 21]      |                                        dot:                  '.'
[L:  1, P: 22]      |                                        naked_identifier:     'dto'
[L:  1, P: 25]      |                                        dot:                  '.'
[L:  1, P: 26]      |                                        function_name_identifier:  'tablename'
[L:  1, P: 35]      |                                    bracketed:
[L:  1, P: 35]      |                                        start_bracket:        '('
[L:  1, P: 36]      |                                        [META] indent:
[L:  1, P: 36]      |                                        expression:
[L:  1, P: 36]      |                                            column_reference:
[L:  1, P: 36]      |                                                naked_identifier:  'NOLOCK'
[L:  1, P: 42]      |                                        [META] dedent:
[L:  1, P: 42]      |                                        end_bracket:          ')'
[L:  1, P: 43]      |                        [META] dedent:
[L:  1, P: 43]      |    newline:                                                  '\n'
[L:  2, P:  1]      |    [META] end_of_file:

For select * from dbname.dto.tablename with(NOLOCK)

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        wildcard_expression:
[L:  1, P:  8]      |                            wildcard_identifier:
[L:  1, P:  8]      |                                star:                         '*'
[L:  1, P:  9]      |                    whitespace:                               ' '
[L:  1, P: 10]      |                [META] dedent:
[L:  1, P: 10]      |                from_clause:
[L:  1, P: 10]      |                    keyword:                                  'from'
[L:  1, P: 14]      |                    whitespace:                               ' '
[L:  1, P: 15]      |                    from_expression:
[L:  1, P: 15]      |                        [META] indent:
[L:  1, P: 15]      |                        from_expression_element:
[L:  1, P: 15]      |                            table_expression:
[L:  1, P: 15]      |                                table_reference:
[L:  1, P: 15]      |                                    naked_identifier:         'dbname'
[L:  1, P: 21]      |                                    dot:                      '.'
[L:  1, P: 22]      |                                    naked_identifier:         'dto'
[L:  1, P: 25]      |                                    dot:                      '.'
[L:  1, P: 26]      |                                    naked_identifier:         'tablename'
[L:  1, P: 35]      |                            whitespace:                       ' '
[L:  1, P: 36]      |                            post_table_expression:
[L:  1, P: 36]      |                                keyword:                      'with'
[L:  1, P: 40]      |                                bracketed:
[L:  1, P: 40]      |                                    start_bracket:            '('
[L:  1, P: 41]      |                                    [META] indent:
[L:  1, P: 41]      |                                    query_hint_segment:
[L:  1, P: 41]      |                                        keyword:              'NOLOCK'
[L:  1, P: 47]      |                                    [META] dedent:
[L:  1, P: 47]      |                                    end_bracket:              ')'
[L:  1, P: 48]      |                        [META] dedent:
[L:  1, P: 48]      |    newline:                                                  '\n'
[L:  2, P:  1]      |    [META] end_of_file:

We need sqlfluff to get this fixed.