reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.19k stars 215 forks source link

"function lineage" #620

Open fomcl opened 1 month ago

fomcl commented 1 month ago

Is your feature request related to a problem? Please describe. If an alias was used for a function call, I would like to know what the original function call looked like.

# sample code
from sqllineage.runner import LineageRunner

sql = """CREATE TABLE result AS
SELECT procedures.fahrenheit(c) AS f, a, b AS b_name
FROM tbl1
INNER JOIN tbl2
ON tbl1.some_id = tbl2.some_id;"""
result = LineageRunner(sql, dialect="greenplum")
result.print_column_lineage()
lineage = result.get_column_lineage()

# actual output
<default>.result.a <- a
<default>.result.b_name <- b
<default>.result.f <- c

# desired output
<default>.result.a <- a
<default>.result.b_name <- b
<default>.result.f <- <default>.result.procedures.fahrenheit(c) <- c

Describe the solution you'd like See above. Right now, there's no way (or perhaps I'm doing something wrong!) of telling if c is a copy of f (select f as c) or something else.

By the way, before I forget: awesome package. Thank you very much for creating this!

reata commented 2 weeks ago

Thanks for loving sqllineage.

We have a separate issue #443 to track a similar request, and function is considered a subset of column expression there. For your example, there are other SQL patterns like:

-- directly using the formula to convert Celcius to Fahrenheit
select (c * 9 / 5) + 32;
-- using case clause together with function.
select case when t == 'Celsius' then procedures.fahrenheit(c) when t == 'Fahrenheit' then c else NULL end;

All this would be too much to add to the column lineage path, if possible.

Thinking of lineage as a graph, the preferred approach would be to add the raw expression as a property to the column lineage edge (and somehow expose it to the user via some new API), instead of adding another node.

We're nowhere near getting #443 done yet. But do let me know your thoughts.