reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.19k stars 215 forks source link

Error parsing tsql without semicolons #573

Open Guojcc opened 5 months ago

Guojcc commented 5 months ago

Describe the bug Error parsing tsql containing 'group by' statement without semicolon. add TSQL_NO_SEMICOLON config.

SQL

select id from foo group by id 
select * from bar
Statements(#): 1
Source Tables:
    <default>.foo
Target Tables:

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):

Additional context This type of tsql without semicolons can be parsed

select * from foo
select * from bar

image

reata commented 4 months ago

This is a parser issue with sqlfluff that we can verify with:

sqlfluff parse test.sql --dialect=tsql

There's only one statement parsed when GROUP BY clause is added:

select id from foo group by id 
select * from bar
[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        column_reference:
[L:  1, P:  8]      |                            naked_identifier:                 'id'
[L:  1, P: 10]      |                    [META] dedent:
[L:  1, P: 10]      |                whitespace:                                   ' '
[L:  1, P: 11]      |                from_clause:
[L:  1, P: 11]      |                    keyword:                                  'from'
[L:  1, P: 15]      |                    whitespace:                               ' '
[L:  1, P: 16]      |                    from_expression:
[L:  1, P: 16]      |                        [META] indent:
[L:  1, P: 16]      |                        from_expression_element:
[L:  1, P: 16]      |                            table_expression:
[L:  1, P: 16]      |                                table_reference:
[L:  1, P: 16]      |                                    naked_identifier:         'foo'
[L:  1, P: 19]      |                        [META] dedent:
[L:  1, P: 19]      |                whitespace:                                   ' '
[L:  1, P: 20]      |                groupby_clause:
[L:  1, P: 20]      |                    keyword:                                  'group'
[L:  1, P: 25]      |                    whitespace:                               ' '
[L:  1, P: 26]      |                    keyword:                                  'by'
[L:  1, P: 28]      |                    [META] indent:
[L:  1, P: 28]      |                    whitespace:                               ' '
[L:  1, P: 29]      |                    column_reference:
[L:  1, P: 29]      |                        naked_identifier:                     'id'
[L:  1, P: 31]      |                    whitespace:                               ' '
[L:  1, P: 32]      |                    newline:                                  '\n'
[L:  2, P:  1]      |                    expression:
[L:  2, P:  1]      |                        select_statement:
[L:  2, P:  1]      |                            select_clause:
[L:  2, P:  1]      |                                keyword:                      'select'
[L:  2, P:  7]      |                                [META] indent:
[L:  2, P:  7]      |                                whitespace:                   ' '
[L:  2, P:  8]      |                                select_clause_element:
[L:  2, P:  8]      |                                    wildcard_expression:
[L:  2, P:  8]      |                                        wildcard_identifier:
[L:  2, P:  8]      |                                            star:             '*'
[L:  2, P:  9]      |                                [META] dedent:
[L:  2, P:  9]      |                            whitespace:                       ' '
[L:  2, P: 10]      |                            from_clause:
[L:  2, P: 10]      |                                keyword:                      'from'
[L:  2, P: 14]      |                                whitespace:                   ' '
[L:  2, P: 15]      |                                from_expression:
[L:  2, P: 15]      |                                    [META] indent:
[L:  2, P: 15]      |                                    from_expression_element:
[L:  2, P: 15]      |                                        table_expression:
[L:  2, P: 15]      |                                            table_reference:
[L:  2, P: 15]      |                                                naked_identifier:  'bar'
[L:  2, P: 18]      |                                    [META] dedent:
[L:  2, P: 18]      |                    [META] dedent:
[L:  2, P: 18]      |    newline:                                                  '\n'
[L:  3, P:  1]      |    [META] end_of_file:

Whereas removing GROUP BY results in the correct AST with two statements:

select id from foo
select * from bar
[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        column_reference:
[L:  1, P:  8]      |                            naked_identifier:                 'id'
[L:  1, P: 10]      |                    [META] dedent:
[L:  1, P: 10]      |                whitespace:                                   ' '
[L:  1, P: 11]      |                from_clause:
[L:  1, P: 11]      |                    keyword:                                  'from'
[L:  1, P: 15]      |                    whitespace:                               ' '
[L:  1, P: 16]      |                    from_expression:
[L:  1, P: 16]      |                        [META] indent:
[L:  1, P: 16]      |                        from_expression_element:
[L:  1, P: 16]      |                            table_expression:
[L:  1, P: 16]      |                                table_reference:
[L:  1, P: 16]      |                                    naked_identifier:         'foo'
[L:  1, P: 19]      |                        [META] dedent:
[L:  1, P: 19]      |        whitespace:                                           ' '
[L:  1, P: 20]      |        newline:                                              '\n'
[L:  2, P:  1]      |        statement:
[L:  2, P:  1]      |            select_statement:
[L:  2, P:  1]      |                select_clause:
[L:  2, P:  1]      |                    keyword:                                  'select'
[L:  2, P:  7]      |                    [META] indent:
[L:  2, P:  7]      |                    whitespace:                               ' '
[L:  2, P:  8]      |                    select_clause_element:
[L:  2, P:  8]      |                        wildcard_expression:
[L:  2, P:  8]      |                            wildcard_identifier:
[L:  2, P:  8]      |                                star:                         '*'
[L:  2, P:  9]      |                    [META] dedent:
[L:  2, P:  9]      |                whitespace:                                   ' '
[L:  2, P: 10]      |                from_clause:
[L:  2, P: 10]      |                    keyword:                                  'from'
[L:  2, P: 14]      |                    whitespace:                               ' '
[L:  2, P: 15]      |                    from_expression:
[L:  2, P: 15]      |                        [META] indent:
[L:  2, P: 15]      |                        from_expression_element:
[L:  2, P: 15]      |                            table_expression:
[L:  2, P: 15]      |                                table_reference:
[L:  2, P: 15]      |                                    naked_identifier:         'bar'
[L:  2, P: 18]      |                        [META] dedent:
[L:  2, P: 18]      |    newline:                                                  '\n'
[L:  3, P:  1]      |    [META] end_of_file:

We need to fix it on sqlfluff side.

Guojcc commented 4 months ago

I hava already submitted a bug to sqlfluff, Please help to supplement other content

[#5601]