Closed maoxingda closed 8 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
a93b894
) 99.50% compared to head (df79cf0
) 99.50%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
insert into public.tgt_tbl1
(
id,
id_original
)
select
a || b || c || id as id,
id as id_original -- # noqa: E501 TODO: I need the metadata information for the table public.src_tbl1 to identify whether the column reference 'id' in this context is from the table public.src_tbl1 or from an alias reference, currently being used as an alias reference. Note: This decision may significantly deviate from the actual scenario.
from
public.src_tbl1
I need the help of a pro.πππ
See my comment on #539 or email. Maybe we should take a step back and rethink on the approach.
select
a || b || c as c,
c as d -- Metadata for a subquery is needed in this context to confirm whether the reference to 'c' is from the subquery or an alias reference.
from
(
select
1 as a,
2 as b,
3 as c
)
The current parsing method has no knowledge of the subquery in this context. Because at this point, the subquery has not yet begun to be parsed.
Move the parsing of the subquery ahead of the for loop, so that the metadata information of the subquery is available. This way, can we determine whether the column references in the SELECT clause come from the subquery or lateral alias references?
After #552 merge, I will refactor the configuration of LATERAL_COLUMN_ALIAS_REFERENCE
Not sure if this is already complete. Please re-request review once you're done.
Not sure if this is already complete. Please re-request review once you're done.
already done.
Rebase master and make LATERAL_COLUMN_ALIAS_REFERENCE bool type.
This PR indeed makes huge changes to current design. I need to take more time to review and refactor. The part I dislike most is that we need to do different things by telling if SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE and bool(self.metadata_provider) is True
everywhere. It will be better we can do this in a unified way and use this condition as few as possible.
The first option I was trying is to move self.extract_subquery(subqueries, holder)
up in select.py
regardless. But that doesn't work yet. I left a open question, not sure if that will work. If you can help me investigate, that will be helpful.
By the way, I checked all the test cases and they look good.
OK, let me try.
already refactor done, please re-review, thanks. β
https://www.databricks.com/blog/introducing-support-lateral-column-alias
I donβt know much about databricks but it seems that it also supports lateral column alias (LCA) reference.
https://sqlkover.com/cool-stuff-in-snowflake-part-4-aliasing-all-the-things/
It seems that snowflake also supports it, but unfortunately I donβt have the environment to verify it.
It seems that snowflake also supports it, but unfortunately I donβt find the official document.
Popularity is one of the considerations. The major reason I'd like to move this feature as configurable is that it won't function without metadata. And the assumption is that by default sqllineage only does static code analysis and metadata is not present.
Agree. This feature is added because it is very pleasant to use.πππ
π
Postpone merging to next week.
Right now to_source_columns
does part of the column to table/subquery resolution work. It would be better that we can handle this universally in _build_digraph
method of class SQLLineageHolder.
I'm investigating whether we can move all logic to end_of_query_cleanup
and not modify to_source_columns
.
fix #539