reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.19k stars 215 forks source link

Column level lineage not drawn properly when metadata is provided #597

Closed piekill closed 1 month ago

piekill commented 2 months ago

Describe the bug When I provide the metadata, the printed column level lineage is correct, but the figure generated doesn't contain this information. Printed column lineage: <default>.res.c1 <- db1.tab1.c1

image

SQL

insert into res
select c1
from db1.tab1 a join db1.tab2 b

To Reproduce

from sqllineage.core.metadata.dummy import DummyMetaDataProvider
from sqllineage.runner import LineageRunner

sql = """
insert into res
select c1
from db1.tab1 a join db1.tab2 b
"""
lr = LineageRunner(sql, metadata_provider=DummyMetaDataProvider({"db1.tab1": ["c1"], "db1.tab2": ["c2"]}))
lr.print_column_lineage()
lr.draw()

Expected behavior If I use the following sql (replace c1 with a.c1):

insert into res
select a.c1
from db1.tab1 a join db1.tab2 b

I can get the expected figure (the printed column lineage is the same): image

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):

Updated I think the reason is that metadata provider is not currently used in visualization:

https://github.com/reata/sqllineage/blob/5b7173ee6130c907294c5e51a12e6ecf4d12025a/sqllineage/drawing.py#L168

I can get the expected behavior if I manually set the metadata_provider here. I guess there could be a better way of doing this.

reata commented 1 month ago

Thanks for reporting this issue. I don't think we carefully designed for frontend visualization back when developing metadata providers.

As you mentioned, we do need a more clever way to let the web server "remember" the metadata provider instead of hard code it in /lineage controller.