reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.3k stars 235 forks source link

Make Lateral Column Alias Reference Configurable #539

Closed maoxingda closed 8 months ago

maoxingda commented 8 months ago

SQL

insert into public.tgt_tbl1
(
    name,
    email
)
select
    st1.name,
    st1.name || st1.email || '@gmail.com' as email
from
    public.src_tbl1 as st1

To Reproduce

Note here we refer to SQL provided in prior step as stored in a file named test.sql

from sqllineage.runner import LineageRunner

with open("test.sql") as f:
    sql = f.read()

lr = LineageRunner(sql, dialect="redshift")

lr.print_column_lineage()

Actual behavior

public.tgt_tbl1.email <- public.src_tbl1.email

Expected behavior

public.tgt_tbl1.name <- public.src_tbl1.name
public.tgt_tbl1.name <- public.src_tbl1.email
public.tgt_tbl1.email <- public.src_tbl1.email

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):

Additional context

reata commented 8 months ago

This is the side effect of #507.

It looks like "Lateral Column Alias Reference" is highly dependent on metadata. I'm thinking that maybe we should disable this logic by default and only trigger it if a true metadata_provider is passed in.