reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.19k stars 215 forks source link

SELECT * from CTE fails to resolve column lineage #630

Open skada-coder opened 1 week ago

skada-coder commented 1 week ago

Describe the bug Column lineage fails to resolve when referencing a SELECT * from a CTE.

Not sure if this is the same issue as https://github.com/reata/sqllineage/issues/303 as it looks similar just 1 layer deeper with a secondary select * CTE.

Below is a simplified example, our production scenarios are more complicated but it boils down to this.

SQL

CREATE TABLE MAIN.FOOBAR AS (
    WITH FOO AS (
        SELECT COL1, COL2 FROM FROM MAIN.BAR
    )
    SELECT * FROM FOO
)

Column Lineage fails to resolve columns defined in FOO

To Reproduce

from sqllineage.runner import LineageRunner
sql1 = """
CREATE TABLE MAIN.FOOBAR AS (
    WITH FOO AS (
        SELECT COL1, COL2 FROM FROM MAIN.BAR
    )
    SELECT * FROM FOO
)
"""
LineageRunner(sql1).print_column_lineage()

actual output is

main.foobar.* <- foo.*

I'd expect it to return this instead

main.foobar.col1 <- main.bar.col1
main.foobar.col2 <- main.bar.col2

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):

We noticed the issue using the snowflake dialect for our own production use cases