reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.19k stars 215 forks source link

False negative for Scalar Subquery used in Function #614

Closed sbrugman closed 1 month ago

sbrugman commented 1 month ago

Describe the bug

Great package, thanks for your work! I've found that for this specific case which our users sometimes write, the lineage does not detect the table reference.

I've check that the parsing via sqlfluff works well.

SQL Paste the SQL text here. For example:

Create table hello_world AS

SELECT EXPLODE(SEQUENCE(
   TO_DATE((SELECT MIN(my_date) FROM my_table),
TO_DATE((SELECT MAX(my_date) FROM my_table),
INTERVAL 1 DAY
)) AS result

To Reproduce Note here we refer to SQL provided in prior step as stored in a file named test.sql

CLI (Command Line Interface): provide the command you're calling and the output.

Command:

sqllineage -f test.sql --dialect=sparksql
Statements(#): 1
Source Tables:

Target Tables:
    <default>.hello_world

Expected behavior A clear and concise description of what you expected to happen, and the output in accordance with the To Reproduce section.

Statements(#): 1
Source Tables:
    <default>.my_table

Target Tables:
    <default>.hello_world

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):

Additional context Add any other context about the problem here.

reata commented 1 month ago

As of 1.5.3, we only handle scalar subquery in case clause, where clause. Scalar subquery in function logic is missed for this case.

sbrugman commented 1 month ago

Thanks for the quick fix! (Next release will solve the issue completely)