Open ulixius9 opened 2 years ago
marking this as P1 for now, we need to gather more info on all queries that work or do not work. Also let's test these things after upgrading sqllineage to latest
Cannot extract source table information from query [select ind_head.indexrelid index_id,
k col_idx,
k <= indnkeyatts /* true */ in_key,
ind_head.indkey[k-$3] column_position,
ind_head.indoption[k-$4] column_options,
ind_head.indcollation[k-$5] /* null */ as collation,
collname /* null */ as collation_str,
ind_head.indclass[k-$6] as opclass,
case when opcdefault then $7 else opcname end as opclass_str,
case
when indexprs is null then $8
when ind_head.indkey[k-$9] = $10 then chr($11) || pg_catalog.pg_get_indexdef(ind_head.indexrelid, k::int, $12)
else pg_catalog.pg_get_indexdef(ind_head.indexrelid, k::int, $13)
end as expression,
amcanorder can_order
from pg_catalog.pg_index /* (select *, pg_catalog.generate_subscripts(indkey::int[], 1) + 1 k from pg_catalog.pg_index) */ ind_head
join pg_catalog.pg_class ind_stor
on ind_stor.oid = ind_head.indexrelid
cross join unnest(ind_head.indkey) with ordinality u(u, k)
left join pg_catalog.pg_collation
on pg_collation.oid = ind_head.indcollation[k-$14]
cross join pg_catalog.pg_indexam_has_property(ind_stor.relam, $15) amcanorder /* left join pg_catalog.pg_am am on ind_stor.relam = am.oid*/
left join pg_catalog.pg_opclass
on pg_opclass.oid = ind_head.indclass[k-$16]
where ind_stor.relnamespace = $1::oid
and ind_stor.relkind in ($17, $18)
and pg_catalog.age(ind_stor.xmin) <= coalesce(nullif(greatest(pg_catalog.age($2::varchar::xid), $19), $20), $21)
order by index_id, k]: An Identifier is expected, got Function[value: unnest(ind_head.indkey)] instead.
ref: https://openmetadata.slack.com/archives/C02B6955S4S/p1664877683642579
Adding the Query from DBT here which cannot be parsed. Dialect: Redshift
with customers as (
select * from TESTDB.dbt_jaffle.stg_customers
),
orders as (
select * from TESTDB.dbt_jaffle.stg_orders
),
payments as (
select * from TESTDB.dbt_jaffle.stg_payments
),
customer_orders as (
select
customer_id,
min(order_date) as first_order,
max(order_date) as most_recent_order,
count(order_id) as number_of_orders
from orders
group by customer_id
),
customer_payments as (
select
orders.customer_id,
sum(amount) as total_amount
from payments
left join orders on
payments.order_id = orders.order_id
group by orders.customer_id
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order,
customer_orders.most_recent_order,
customer_orders.number_of_orders,
customer_payments.total_amount as customer_lifetime_value
from customers
left join customer_orders
on customers.customer_id = customer_orders.customer_id
left join customer_payments
on customers.customer_id = customer_payments.customer_id
)
select * from final
INSERT INTO staging.migration_reports
(description, name, category, criticality, cases, is_valid)
VALUES($1,$2,$3,$4,$5,$6)
ON CONFLICT (category, name) DO UPDATE
SET description=$1, criticality=$4, cases=$5, is_valid=$6
It also brings another trace at:
[2022-12-19 17:34:18,335] {sql_lineage.py:424} ERROR - Ingesting lineage failed for service [Migration]: An Identifier is expected, got IdentifierList[value: description=$1, criticality=$4, cases=$5, is_valid=$6] instead.
[2022-12-19 17:34:18,335] {sql_lineage.py:380} DEBUG - Running lineage with query: INSERT INTO staging.migration_infos
(id, pipeline_url, pipeline_id, agency_fk, extraction_date, transformation_date, mongo_dump, instance_id)
SELECT $1, $2, $3, $4::public.OBJECT_ID, $5::TIMESTAMP, $6::TIMESTAMP, $7, $8
ON CONFLICT (pipeline_id) DO UPDATE
SET id = $1, pipeline_url = $2, extraction_date = $5, transformation_date = $6::TIMESTAMP, mongo_dump = $7, instance_id = $8
[2022-12-19 17:34:18,341] {sql_lineage.py:423} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 383, in get_lineage_by_query
raw_column_lineage = result.get_column_lineage()
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
self._eval()
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in _eval
self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in <listcomp>
self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 60, in analyze
self._extract_from_dml(stmt, AnalyzerContext())
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 135, in _extract_from_dml
next_handler.handle(sub_token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/base.py", line 39, in handle
self._handle(token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/source.py", line 58, in _handle
self._handle_column(token)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/source.py", line 105, in _handle_column
self.columns.append(Column.of(token))
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 230, in of
return Column(
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 174, in __init__
self.raw_name = escape_identifier_name(name)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 8, in escape_identifier_name
return name.strip("`").strip('"').strip("'")
AttributeError: 'NoneType' object has no attribute 'strip'
-- comment
this query will not be picket up anymore by https://github.com/open-metadata/OpenMetadata/pull/11457
From https://openmetadata.slack.com/archives/C02B6955S4S/p1672892210654719 sample of error query (masked some keyword)
/* BigQuery */
MERGE `project_id.dataset_name.table_name`
USING (
SELECT tab, DATE(open_at) AS date, tab_id, full_name, short_name, tab_type,
COUNT(*) AS opens, COUNTIF(os = 'a') AS opens_a, COUNTIF(os != 'a') AS opens_non_a,
FROM `project_id.dataset_name.table_name_b`
WHERE DATE(open_at) BETWEEN '2022-12-31' AND '2023-01-03'
GROUP BY 1, 2, 3, 4, 5, 6
) s
ON FALSE
WHEN NOT MATCHED AND date BETWEEN '2022-12-31' AND '2023-01-03' THEN
INSERT VALUES (s.tab, s.date, s.tab_id, s.full_name, s.short_name, s.tab_type, s.opens, s.opens_a, s.opens_non_a)
WHEN NOT MATCHED BY SOURCE AND date BETWEEN '2022-12-31' AND '2023-01-03' THEN
DELETE
Error Message:
[2023-01-05 04:09:45,281] {sql_lineage.py:370} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 330, in get_lineage_by_query
raw_column_lineage = lineage_parser.column_lineage
File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 111, in column_lineage
return self.parser.get_column_lineage()
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
self._eval()
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in _eval
self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in <listcomp>
self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 60, in analyze
self._extract_from_dml(stmt, AnalyzerContext())
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 135, in _extract_from_dml
next_handler.handle(sub_token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/base.py", line 39, in handle
self._handle(token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/cte.py", line 31, in _handle
raise SQLLineageException(
sqllineage.exceptions.SQLLineageException: An Identifier or IdentifierList is expected, got Token[value: (] instead.
[2023-01-05 04:09:45,282] {sql_lineage.py:371} ERROR - Ingesting lineage failed for service [BigQuery]: An Identifier or IdentifierList is expected, got Token[value: (] instead.
@mapler @OnkarVO7 @ulixius9 Please, could you add the dialect of those queries? The reference to the Slack message will disappear after 90 days. Thank you
One more failed case in BigQuery
MERGE `project_id.dataset_id.target_table` t
USING (
WITH base AS (
SELECT
date, channel, cnt, user_count, total_user_count,
FROM `project_id.dataset_id.origin_table`
)
SELECT
date, channel, cnt, total_user_count, SAFE_DIVIDE(cnt, user_count) AS rate
FROM base
) s
ON t.date = s.date and t.channel = s.channel
WHEN NOT MATCHED THEN
INSERT ROW
WHEN MATCHED THEN
UPDATE SET t.total_user_cnt = s.total_user_cnt,
t.cnt = s.cnt,
t.rate = s.rate
This case WORKS on snowflake
create or replace view TEST_DB.USR.FCT_DEMO (
ID,
ACCOUNT_ID,
F_NAME,
L_NAME
) as (
SELECT
ua.id as id,
ua.account_id,
p.first_name as f_name,
p.last_name as l_name
FROM
TEST_DB.SCHEMA.USER_ACCOUNT ua LEFT JOIN TEST_DB.SCHEMA.PERSON p ON ua.id = p.user_id
);
However, although the query looks simple enough, no col-level lineage is generated
https://sandbox.open-metadata.org/table/MySQL_Test.sql12611281.sql12611281.vRisk_Denorm/lineage
query (mysql) dialect:
CREATE ALGORITHM=UNDEFINED DEFINER=`sql12611281`@`%` SQL SECURITY DEFINER VIEW `sql12611281`.`vRisk_Denorm` AS select `a`.`AsOfDate` AS `AsOfDate`,`a`.`AssetID` AS `AssetID`,`a`.`AssetRate` AS `AssetRate`,`a`.`AssetRateIndex` AS `AssetRateIndex`,`a`.`AssetRateIndexBehavior` AS `AssetRateIndexBehavior`,`a`.`CostPriceSettled` AS `CostPriceSettled`,`a`.`CostPriceTraded` AS `CostPriceTraded`,`a`.`CurrencyType_Identifier` AS `CurrencyType_Identifier`,`a`.`IntRate` AS `IntRate`,`a`.`Issuer_PortfolioCompanyNameNM` AS `Issuer_PortfolioCompanyNameNM`,`a`.`MaturityDate` AS `MaturityDate`,`a`.`ParAmountSettled` AS `ParAmountSettled`,`a`.`ParAmountTraded` AS `ParAmountTraded`,`a`.`Portfolio` AS `Portfolio`,`a`.`Portfolio_EntityId` AS `Portfolio_EntityId`,`a`.`Position_Id` AS `Position_Id`,`a`.`QuantitySettled` AS `QuantitySettled`,`a`.`QuantityTraded` AS `QuantityTraded`,`a`.`SecurityID` AS `SecurityID`,`a`.`SettledvsTraded` AS `SettledvsTraded`,`a`.`UnitsSettled` AS `UnitsSettled`,`a`.`UnitsTraded` AS `UnitsTraded`,`b`.`Source System` AS `Source System`,`b`.`Portfolio_Name` AS `Portfolio_Name`,`b`.`PositionKeyID` AS `PositionKeyID`,`b`.`Name` AS `Name`,`b`.`Credit Spread` AS `Credit Spread`,`b`.`Issuer OAS` AS `Issuer OAS`,`b`.`Idiosyncratic OAS` AS `Idiosyncratic OAS`,`b`.`Probability of Default` AS `Probability of Default`,`b`.`Recovery Rate` AS `Recovery Rate`,`b`.`Weighted Average Life` AS `Weighted Average Life`,`b`.`Theta` AS `Theta`,`b`.`Effective Convexity` AS `Effective Convexity`,`b`.`Effective Duration` AS `Effective Duration`,`b`.`Delta-Adjusted Underlying PV` AS `Delta-Adjusted Underlying PV`,`b`.`Gross Exposure` AS `Gross Exposure`,`b`.`Long Exposure` AS `Long Exposure`,`b`.`Net Exposure` AS `Net Exposure`,`b`.`Short Exposure` AS `Short Exposure`,`b`.`Modified Duration` AS `Modified Duration`,`b`.`ESG_;VERNANCE_SCORE` AS `ESG_;VERNANCE_SCORE`,`b`.`SDG_01_NET_ALIGNMENT_SCORE` AS `SDG_01_NET_ALIGNMENT_SCORE`,`b`.`SDG_02_NET_ALIGNMENT_SCORE` AS `SDG_02_NET_ALIGNMENT_SCORE`,`b`.`SDG_03_NET_ALIGNMENT_SCORE` AS `SDG_03_NET_ALIGNMENT_SCORE`,`b`.`SDG_04_NET_ALIGNMENT_SCORE` AS `SDG_04_NET_ALIGNMENT_SCORE`,`b`.`SDG_05_NET_ALIGNMENT_SCORE` AS `SDG_05_NET_ALIGNMENT_SCORE`,`b`.`SDG_06_NET_ALIGNMENT_SCORE` AS `SDG_06_NET_ALIGNMENT_SCORE`,`b`.`SDG_07_NET_ALIGNMENT_SCORE` AS `SDG_07_NET_ALIGNMENT_SCORE`,`b`.`SDG_08_NET_ALIGNMENT_SCORE` AS `SDG_08_NET_ALIGNMENT_SCORE`,`b`.`SDG_09_NET_ALIGNMENT_SCORE` AS `SDG_09_NET_ALIGNMENT_SCORE`,`b`.`SDG_10_NET_ALIGNMENT_SCORE` AS `SDG_10_NET_ALIGNMENT_SCORE`,`b`.`SDG_11_NET_ALIGNMENT_SCORE` AS `SDG_11_NET_ALIGNMENT_SCORE`,`b`.`SDG_12_NET_ALIGNMENT_SCORE` AS `SDG_12_NET_ALIGNMENT_SCORE`,`b`.`SDG_13_NET_ALIGNMENT_SCORE` AS `SDG_13_NET_ALIGNMENT_SCORE`,`b`.`SDG_14_NET_ALIGNMENT_SCORE` AS `SDG_14_NET_ALIGNMENT_SCORE`,`b`.`SDG_15_NET_ALIGNMENT_SCORE` AS `SDG_15_NET_ALIGNMENT_SCORE`,`b`.`SDG_16_NET_ALIGNMENT_SCORE` AS `SDG_16_NET_ALIGNMENT_SCORE`,`b`.`SDG_17_NET_ALIGNMENT_SCORE` AS `SDG_17_NET_ALIGNMENT_SCORE`,`b`.`CONTROVERSY_SCORE` AS `CONTROVERSY_SCORE`,`b`.`SOCIAL_CONTROVERSY_SCORE` AS `SOCIAL_CONTROVERSY_SCORE`,`b`.`CUSTOMER_CONTROVERSY_SCORE` AS `CUSTOMER_CONTROVERSY_SCORE`,`b`.`ENVIRONMENT_CONTROVERSY_SCORE` AS `ENVIRONMENT_CONTROVERSY_SCORE`,`b`.`;VERNANCE_CONTROVERSY_SCORE` AS `;VERNANCE_CONTROVERSY_SCORE`,`b`.`HUMAN_RIGHTS_CONTROVERSY_SCORE` AS `HUMAN_RIGHTS_CONTROVERSY_SCORE`,`b`.`LABOR_RIGHTS_CONTROVERSY_SCORE` AS `LABOR_RIGHTS_CONTROVERSY_SCORE`,`b`.`CLIMATE_CHANGE_THEME_SCORE` AS `CLIMATE_CHANGE_THEME_SCORE`,`b`.`BUSINESS_ETHICS_THEME_SCORE` AS `BUSINESS_ETHICS_THEME_SCORE`,`b`.`CORPORATE_;V_THEME_SCORE` AS `CORPORATE_;V_THEME_SCORE`,`b`.`ENVIRONMENTAL_OPPS_THEME_SCORE` AS `ENVIRONMENTAL_OPPS_THEME_SCORE`,`b`.`HUMAN_CAPITAL_THEME_SCORE` AS `HUMAN_CAPITAL_THEME_SCORE`,`b`.`NATURAL_RES_USE_THEME_SCORE` AS `NATURAL_RES_USE_THEME_SCORE`,`b`.`WASTE_MGMT_THEME_SCORE` AS `WASTE_MGMT_THEME_SCORE`,`b`.`PRODUCT_SAFETY_THEME_SCORE` AS `PRODUCT_SAFETY_THEME_SCORE`,`b`.`SOCIAL_OPPS_THEME_SCORE` AS `SOCIAL_OPPS_THEME_SCORE`,`b`.`STAKEHOLDER_OPPOSIT_THEME_SCORE` AS `STAKEHOLDER_OPPOSIT_THEME_SCORE`,`b`.`CBN_LCT_SCORE` AS `CBN_LCT_SCORE`,`b`.`PROD_CARB_FTPRNT_SCORE` AS `PROD_CARB_FTPRNT_SCORE`,`b`.`CARBON_EMISSIONS_SCORE` AS `CARBON_EMISSIONS_SCORE`,`b`.`CARBON_EMISSIONS_GHG_MITIGATION_SCORE` AS `CARBON_EMISSIONS_GHG_MITIGATION_SCORE`,`b`.`CARBON_EMISSIONS_SCOPE_1` AS `CARBON_EMISSIONS_SCOPE_1`,`b`.`UNGC_COMPLIANCE` AS `UNGC_COMPLIANCE`,`b`.`CARBON_EMISSIONS_SCOPE_12` AS `CARBON_EMISSIONS_SCOPE_12`,`b`.`CARBON_EMISSIONS_SCOPE_2` AS `CARBON_EMISSIONS_SCOPE_2`,`b`.`CARBON_EMISSIONS_SCOPE_3` AS `CARBON_EMISSIONS_SCOPE_3`,`b`.`CARBON_EMISSIONS_SCOPE123` AS `CARBON_EMISSIONS_SCOPE123`,`b`.`ESG_SCORE` AS `ESG_SCORE`,`b`.`ESG_ENVIRONMENTAL_SCORE` AS `ESG_ENVIRONMENTAL_SCORE`,`b`.`ESG_SOCIAL_SCORE` AS `ESG_SOCIAL_SCORE`,`b`.`Security_ID` AS `Security_ID`,`b`.`_ID` AS `_ID`,`b`.`as_of_date` AS `as_of_date`,`b`.`created_by` AS `created_by`,`b`.`modified_by` AS `modified_by`,`b`.`created_on` AS `created_on`,`b`.`modified_on` AS `modified_on`,`b`.`is_active` AS `is_active`,`b`.`is_processed` AS `is_processed` from (`sql12611281`.`vPositions_denorm_data` `a` join `sql12611281`.`risk_metrics_output_data` `b` on((`a`.`Position_Id` = `b`.`PositionKeyID`)))
add union for materialized views
all_views union all_mviews
based on slack-support-discussion:
and git ticket: issue on git
regarding to the column based lineage issue: mentioned in here: link to slack support
In case of an oracle view the column based lineage is not shown - you only can see the connected main id:
related to @SabineGl's comment the query looks like this create view new_view (col1,col2) select col1,col2 from table
to replicate this we can ingest the oracle source containing views
context: https://openmetadata.slack.com/archives/C02B6955S4S/p1681204033955849
dialect: snowflake support thread: https://openmetadata.slack.com/archives/C02B6955S4S/p1681277509847309
CREATE OR REPLACE TRANSIENT TABLE final_table
copy grants
as
Select a.*,
coalesce(table_1.column_d, table_2.column_d, table_3.column_d, table_4.column_d) AS column_a,
coalesce(table_1.type, table_2.type, table_3.type, table_4.type) AS column_b,
table_5.column_b,
CASE WHEN table_5.column_b IS NOT NULL or column_c = 'alpha' THEN 'alpha'
WHEN coalesce(table_1.column_d, table_2.column_d, table_3.column_d, table_4.column_d) IS NOT NULL AND table_5.column_b IS NULL and column_c <> 'alpha' THEN 'beta'
ELSE column_c END AS financial_column_c,
max(CASE WHEN pr.id_1 IS NULL THEN 0 ELSE 1 END) AS column_e
FROM
(
select distinct
table_6.column_c,
table_6.id_2,
table_6.id_3,
table_6.date_1,
table_6.date_1 AS id_3_date,
table_6.date_2,
table_6.column_f,
table_6.column_g,
table_6.id_4,
table_6.column_h,
table_6.column_j,
table_6.column_i,
table_6.column_k,
table_6.column_l,
table_6.column_m,
table_6.column_n,
table_6.column_o,
table_6.column_p,
table_6.column_q,
table_6.column_r,
table_6.column_s,
table_6.column_t,
table_6.column_u,
table_6.column_v,
table_6.column_w,
table_6.column_x,
table_6.column_y,
table_6.id_5,
table_6.column_z,
table_6.column_aa,
table_6.column_ab,
table_6.column_ac,
table_6.column_ad,
table_6.column_ae,
table_6.column_af,
table_6.column_ag,
table_6.column_ah,
table_6.column_ai,
table_6.column_aj,
table_6.column_ak,
table_6.column_al,
table_6.column_am,
table_6.column_an,
table_6.column_ao,
table_6.column_ap,
table_6.column_aq,
table_6.column_ar,
table_6.column_as,
table_6.id_6,
table_6.id_7,
table_6.column_at,
table_6.column_au,
table_6.column_av,
table_6.column_aw,
case
when lower(table_6.column_g)='gamma' or lower(column_ax) = 'gamma' then 'delta'
when lower(table_6.column_g) = 'delta' then 'Delta'
when lower(table_6.column_g) = 'eta' then 'Eta'
when lower(table_6.column_g) = 'zeta' then 'Zeta'
when lower(table_6.column_g)='theta' then 'Theta'
when lower(table_6.column_s) = lower(table_7.column_ay) then 'iota'
when lower(table_6.column_g)=lower('kappa') then 'kappa'
else 'iota'
END as column_g_o1,
case
when table_6.column_al = 'lambda' then 'lambda'
else table_8.column_bj
end as column_bj,
case
when table_6.column_f='mu' then 0
else table_8.column_ba
end as column_ba,
case
when table_6.id_3 =table_9.column_bb then 'nu'
else 'xi'
end as column_bd,
case
when table_6.column_f not in ('omni','pi','mu') then null
when datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2)) = 0 and table_6.id_3 =table_11.id_3 then 'nu'
else 'xi'
end as column_be,
case
when table_6.column_f not in ('omni','pi','mu') then null
when datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2))= 0 then (case when table_6.id_3 =table_11.id_3 then 0 else -99 end)
else datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2))
end as month_diff,
(case
when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' then 'rho'
when y.column_bg is null then 'column_bh'
else y.column_bg
end) column_bg,
(case when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' then 'sigma'
when y.column_bg is null then 'column_bh'
when lower(y.column_bg)='tau' then 'Tau'
when lower(y.column_bg)='upsilon' then 'Upsilon'
else y.column_bg
end) column_bg_c1,
(case when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' or y.column_bg is null or y.column_bg in ('') or y.column_bg is null
or lower(y.column_bg) in ('phi','chi','banner','direct','column_her')
then 'psi'
else 'Inpsi'
end) column_bg_bkt,
table_6.column_ax,
table_6.lead_id_5
FROM
(
select
om.*,
case when lower(om.column_aj) like '%omega%' then column_ap||' / '||column_aq
when lower(om.column_aj) not like '%omega%' then (case when column_aq is null or column_aq='' then column_ap else column_aq end )
end as column_bf
from table_12 om
left join table_13 oe
on om.id_2=oe.id_2 and oe.column_c='bravo'
where oe.id_2 is null
) table_6
left join
(select LOWER(CONCAT(COALESCE(aggregator,''),'-',COALESCE(charlie,''),'-',COALESCE(mid,''))) as column_bj,mapping_id_2,(column_bh*column_bi) as column_ba,column_bi
from table_14 where column_c = 'echo'
and date(entry_timestamp) > date '2001-04-01'
UNION ALL
select column_bj,mapping_id_2,column_ba,column_bi from table_15 where lower(column_c)='echo'
and date(entry_timestamp) <= date '2001-04-01'
) table_8
on table_6.id_2=table_8.mapping_id_2
--Adding column_bd_id_6----------
left join
(
select id_6, id_3 as column_bb,date_1 date_3,row_number() over (partition by id_6 order by date_1) as rank
FROM table_12
) table_9
on table_6.id_6=table_9.id_6 and table_9.rank=1
left join
(
select id_6, id_3,date_2 as column_bc,row_number() over (partition by id_6 order by date_2) as rank
FROM table_12
where column_f in ('omni','pi','mu')
) table_11
on table_6.id_6=table_11.id_6 and table_11.rank=1 and table_6.column_f in ('omni','pi','mu')
left join
(
select distinct y."network/column_ap_column_aq" as column_ap_column_aq, y."channel/column_ap" as column_bg,
row_number() over(partition by y."network/column_ap_column_aq" order by y."channel/column_ap") as rnk
from table_16 y
) y on table_6.column_bf=y.column_ap_column_aq and y.rnk = 1
left join
(SELECT DISTINCT column_s as column_ay
FROM table_17
)table_7 ON table_7.column_ay = table_6.column_s
) a
left join
(-- base64
Select TRY_BASE64_DElambdaE_STRING(id_4) as id_1,
--id_4 as id_1,
id_3_date as date_4,
column_am as date_5,
column_an as date_6
from table_18
where column_f in('golf','hotel') and column_g_o1 in ('india', 'juliet') and column_g != 'kilo'
) pr
ON (a.id_4 = pr.id_1 and a.date_1 >= pr.date_6 and a.date_1 <= pr.date_5 )
LEFT JOIN (
SELECT DISTINCT
column_bk,
column_d,
'lima' as type,
to_date(start_date, 'dd/MM/YYYY') as start_date,
to_date(end_date, 'dd/MM/YYYY') as end_date,
rank() over (partition by column_bk order by column_d) as rank
FROM table_19
WHERE lower(trim(column_bl)) like '%LIMA%'
) table_2
ON lower(a.column_g) = lower(table_2.column_bk)
and date(a.id_3_date) >= table_2.start_date
and date(a.id_3_date) <= table_2.end_date
and table_2.rank = 1
LEFT JOIN
(
SELECT DISTINCT
column_bk,
column_d,
'mike' as type,
to_date(start_date, 'dd/MM/YYYY') as start_date,
to_date(end_date, 'dd/MM/YYYY') as end_date,
rank() over (partition by column_bk order by column_d) as rank
FROM table_19
WHERE lower(trim(column_bl)) like '%column_bk%'
) table_3
ON lower(trim(SUBSTRING(a.column_h,position('@',a.column_h) + 1, length(a.column_h)))) = trim(lower(table_3.column_bk))
and date(a.id_3_date) >= table_3.start_date
and date(a.id_3_date) <= table_3.end_date
and table_3.rank = 1
LEFT JOIN
(
SELECT DISTINCT
column_bk,
column_d,
'november' as type,
to_date(start_date, 'dd/MM/YYYY') as start_date,
to_date(end_date, 'dd/MM/YYYY') as end_date,
rank() over (partition by column_bk order by column_d) as rank
FROM table_19
WHERE lower(trim(column_bl)) like '%code%'
) table_4
ON UPPER(a.column_s) = UPPER(table_4.column_bk)
and date(a.id_3_date) >= table_4.start_date
and date(a.id_3_date) <= table_4.end_date
and table_4.rank = 1
LEFT JOIN
(
SELECT DISTINCT
column_bk,
column_d,
'oscar' as type,
to_date(start_date, 'dd/MM/YYYY') as start_date,
to_date(end_date, 'dd/MM/YYYY') as end_date,
rank() over (partition by column_bk order by column_d) as rank
FROM table_19
WHERE lower(trim(column_bl)) like '%lead%'
) table_1
ON
lower(trim(a.lead_id_5::varchar)) = lower(trim(table_1.column_bk))
and date(a.id_3_date) >= table_1.start_date
and date(a.id_3_date) <= table_1.end_date
and table_1.rank = 1
LEFT JOIN
(
SELECT DISTINCT
u.column_bm,
project as column_b,
to_date(start_date, 'dd-MMMM-yyyy') as start_date,
rank() over (partition by column_bm order by column_y_till desc, project asc) as rank
FROM table_20 table_21
INNER JOIN table_22 u
ON u.number = cast(table_21.column_bn as varchar)
) table_5
ON a.id_4 = table_5.column_bm and table_5.rank = 1 and date(a.id_3_date) >= table_5.start_date
GROUP BY
column_c,id_2,id_3,date_1,id_3_date,date_2,column_f,column_g,id_4,column_h,column_j,column_i,column_k,column_l,column_m,column_n,column_o,column_p,column_q,column_r,column_s,column_t,column_u,column_v,column_w,column_x,column_y,id_5,column_z,column_aa,column_ab,column_ac,column_ad,column_ae,column_af,column_ag,column_ah,column_ai,column_aj,column_ak,column_al,column_am,column_an,column_ao,column_ap,column_aq,column_ar,column_as,id_6,id_7,column_at,column_au,column_g_o1,column_bj,column_ba,column_bd,column_be,month_diff,column_bg,column_bg_c1,column_bg_bkt,column_av,column_aw,
column_ax, lead_id_5, column_a, column_b, table_5.column_b, financial_column_c
dialect: snowflake
Queries starting with CREATE or REPLACE is not creating lineage when used with COPY GRANTS as mentioned in the above message by Queries starting with MERGE INTO are also not being parsed
Hey
When I was ingesting my Metabase dashboards the ingestion was running successfully but I'm getting these errors:
[2023-05-09 01:50:34,558] {metadata.py:206} ERROR - Error creating chart [<MY CARD>]: 'NoneType' object has no attribute 'strip'
[2023-05-10 06:28:16,122] {metadata.py:205} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/dashboard/metabase/metadata.py", line 187, in yield_dashboard_lineage_details yield from self._yield_lineage_from_query( File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/dashboard/metabase/metadata.py", line 227, in _yield_lineage_from_query for table in lineage_parser.source_tables: File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__ value = obj.__dict__[self.func.__name__] = self.func(obj) File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 110, in source_tables return self.retrieve_tables(self.parser.source_tables) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper self._eval() File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in _eval self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt] File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in <listcomp> self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt] File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 62, in analyze self._extract_from_dml(stmt, AnalyzerContext()) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 223, in _extract_from_dml next_handler.handle(sub_token, holder) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/base.py", line 37, in handle self._handle(token, holder) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 61, in _handle self._handle_column(token) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 108, in _handle_column self.columns.append(SqlParseColumn.of(token)) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/models.py", line 83, in of return Column( File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 145, in __init__ self.raw_name = escape_identifier_name(name) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 9, in escape_identifier_name return name.strip("
").strip('"').strip("'")
AttributeError: 'NoneType' object has no attribute 'strip' `
Unable to view upstream lineage for a table made with the query -
CREATE OR REPLACE
TRANSIENT TABLE
DATA_WAREHOUSE.SCHEMA_1.FINAL_TABLE COPY
GRANTS AS
WITH
TABLE_1 AS (
SELECT
COLUMN_1
, COLUMN_2
, COLUMN_3
, COLUMN_4
, COLUMN_5
, COLUMN_6
, COLUMN_7
, COLUMN_8
, COLUMN_9
, BASE64_ENCODE(COLUMN_10) AS COLUMN_10
, BASE64_ENCODE(COLUMN_11) AS COLUMN_11
, COLUMN_12
, COLUMN_13
, COLUMN_14
, COLUMN_15
, COLUMN_16
, COLUMN_17
, COLUMN_20
, COLUMN_21
, COLUMN_22
, COLUMN_23
, COLUMN_24
, COLUMN_25
, COLUMN_26
, COLUMN_27
, COLUMN_28
, COLUMN_29
, COLUMN_30
, COLUMN_31
, COLUMN_32
, COLUMN_33
, COLUMN_34
, COLUMN_35
, COLUMN_36
, COLUMN_37
, COLUMN_38
, COLUMN_39
, COLUMN_40
, COLUMN_41
, BASE64_ENCODE(COLUMN_42) AS COLUMN_42
, COLUMN_43
, COLUMN_44
, COLUMN_45
, COLUMN_46
, COLUMN_47
, COLUMN_48
, COLUMN_49
, COLUMN_50
, COLUMN_51
, CAST(COLUMN_52 AS VARCHAR) AS COLUMN_52
, COLUMN_53
, CAST(COLUMN_54 AS VARCHAR) AS COLUMN_54
, COLUMN_55
, COLUMN_56
, COLUMN_57
, COLUMN_58
, COLUMN_59
, COLUMN_60
, COLUMN_61
, COLUMN_62
, COLUMN_63
, BASE64_ENCODE(COLUMN_64) AS COLUMN_64
, COLUMN_65
, COLUMN_66
, COLUMN_67
, COLUMN_68
, COLUMN_69
, COLUMN_70
, COLUMN_71
, COLUMN_72
, COLUMN_73
, COLUMN_74
, COLUMN_75
, COLUMN_76
, COLUMN_77
, COLUMN_78
, COLUMN_79
, COLUMN_80
, COLUMN_81
, COLUMN_82
, COLUMN_83
, COLUMN_84
, COLUMN_85
, COLUMN_86
, COLUMN_87
, COLUMN_88
, COLUMN_89
, COLUMN_90
, COLUMN_91
, COLUMN_92
, COLUMN_93
, COLUMN_94
, COLUMN_95
, COLUMN_96
, COLUMN_97
, COLUMN_98
, COLUMN_99
FROM
(
SELECT
DISTINCT
COLUMN_1
, COLUMN_2
, COLUMN_3
, COLUMN_4::
TIMESTAMP AS COLUMN_4
, COLUMN_5::
TIMESTAMP AS COLUMN_5
, COLUMN_6::
TIMESTAMP AS COLUMN_6
, COLUMN_7
, COLUMN_8
, COLUMN_9
, COLUMN_10
, COLUMN_11
, COLUMN_12
, COLUMN_13
, COLUMN_14
, COLUMN_18 AS COLUMN_15
, COLUMN_19 AS COLUMN_16
, COLUMN_17
, COLUMN_20 AS COLUMN_20
, COLUMN_21 AS COLUMN_21
, COLUMN_22
, COLUMN_23 AS COLUMN_23
, COLUMN_24
, COLUMN_25
, COLUMN_26
, COLUMN_27
, COLUMN_100 AS COLUMN_28
, COLUMN_29::
TIMESTAMP AS COLUMN_29
, COLUMN_30::
TIMESTAMP AS COLUMN_30
, COLUMN_31
, COLUMN_32
, COLUMN_33
, COLUMN_34
, COLUMN_35
, COLUMN_36
, COLUMN_37
, COLUMN_38
, COLUMN_39
, COLUMN_40
, COLUMN_41
, COLUMN_42 AS COLUMN_42
, COLUMN_43
, COLUMN_44
, COLUMN_45
, COLUMN_46
, COLUMN_47
, COLUMN_48
, COLUMN_49
, COLUMN_50
, NULL AS COLUMN_51
, NULL AS COLUMN_52
, COLUMN_101 AS COLUMN_53
, NULL AS COLUMN_54
, COLUMN_102 AS COLUMN_55
, COLUMN_103 AS COLUMN_56
, COLUMN_104 AS COLUMN_57
, NULL AS COLUMN_59
, NULL AS COLUMN_60
, NULL AS COLUMN_61
, COLUMN_62
, CAST(COLUMN_63 AS BIGINT) AS COLUMN_63
, COLUMN_64
, COLUMN_65
, COLUMN_66
, COLUMN_67
, COLUMN_68
, COLUMN_69
, COLUMN_70
, COLUMN_71
, COLUMN_72
, COLUMN_73
, COLUMN_74
, COLUMN_75
, COLUMN_76
, COLUMN_77
, COLUMN_78
, COLUMN_79
, COLUMN_80
, COLUMN_81
, COLUMN_82
, COLUMN_83
, COLUMN_84
, COLUMN_85
, COLUMN_86
, COLUMN_87::
TIMESTAMP AS COLUMN_87
, COLUMN_88
, COLUMN_89::
TIMESTAMP AS COLUMN_89
, COLUMN_92
, COLUMN_93
, COLUMN_58
, COLUMN_90
, COLUMN_91
, COLUMN_94
, CASE
WHEN
TABLE_2.COLUMN_7 NOT IN ('alpha', 'beta', 'gamma')
THEN
NULL
WHEN
DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_2.COLUMN_6)) = 0
AND TABLE_2.COLUMN_3 = TABLE_3.COLUMN_106
THEN
'delta'
ELSE
'eta'
END AS COLUMN_95
, TABLE_2.COLUMN_107 AS COLUMN_96
, TABLE_2.COLUMN_108 AS COLUMN_97
, TABLE_2.COLUMN_98
, TABLE_2.COLUMN_99
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_4 TABLE_2
LEFT JOIN
(
SELECT
COLUMN_10 AS COLUMN_109
, COLUMN_94 AS COLUMN_110
, COLUMN_3 AS COLUMN_106
, COLUMN_6 AS COLUMN_105
, ROW_NUMBER() OVER(
PARTITION BY
COLUMN_10
, COLUMN_94
ORDER BY
COLUMN_6
) AS RANK
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_4 TABLE_2
WHERE
COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
) TABLE_3
ON
TABLE_3.COLUMN_109 = TABLE_2.COLUMN_10
AND TABLE_3.COLUMN_110 = TABLE_2.COLUMN_94
AND TABLE_3.RANK = 1
AND TABLE_2.COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
LEFT JOIN
(
SELECT
DISTINCT
COLUMN_64 AS COLUMN_111
, COLUMN_3 AS COLUMN_112
, DENSE_RANK() OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_5
) AS COLUMN_86
, LAG(COLUMN_5, 1) OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_5
) AS COLUMN_87
FROM
(
SELECT
DISTINCT
COLUMN_64
, COLUMN_3
, MIN(COLUMN_5) AS COLUMN_5
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_4
WHERE
COLUMN_7 NOT IN ('gamma')
AND COLUMN_64 IS NOT NULL
GROUP BY
1
, 2
)
) TABLE_5
ON
TABLE_2.COLUMN_64 = TABLE_5.COLUMN_111
AND CAST(TABLE_2.COLUMN_3 AS VARCHAR) = CAST(TABLE_5.COLUMN_112 AS VARCHAR)
LEFT JOIN
(
SELECT
DISTINCT
COLUMN_64 AS COLUMN_113
, COLUMN_2 AS COLUMN_114
, DENSE_RANK() OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_6
) AS COLUMN_88
, LAG(COLUMN_6, 1) OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_6
) AS COLUMN_89
FROM
(
SELECT
DISTINCT
COLUMN_64
, COLUMN_2
, COLUMN_6
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_4
WHERE
COLUMN_7 IN ('alpha', 'epsilon')
AND COLUMN_64 IS NOT NULL
)
) TABLE_6
ON
TABLE_2.COLUMN_64 = TABLE_6.COLUMN_113
AND CAST(TABLE_2.COLUMN_2 AS VARCHAR) = CAST(TABLE_6.COLUMN_114 AS VARCHAR)
UNION ALL
SELECT DISTINCT
COLUMN_1
, COLUMN_2
, COLUMN_3
, COLUMN_4::
TIMESTAMP AS COLUMN_4
, COLUMN_5::
TIMESTAMP AS COLUMN_5
, COLUMN_6::
TIMESTAMP AS COLUMN_6
, LOWER(COLUMN_7) AS COLUMN_7
, COLUMN_8
, COLUMN_9
, COLUMN_10
, COLUMN_11
, COLUMN_12
, COLUMN_13
, COLUMN_14
, COLUMN_18 AS COLUMN_15
, COLUMN_19 AS COLUMN_16
, COLUMN_17
, COLUMN_20
, COLUMN_21
, NULL AS COLUMN_22
, COLUMN_23
, COLUMN_24
, COLUMN_25
, COLUMN_26
, COLUMN_27
, COLUMN_100 AS COLUMN_28
, NULL AS COLUMN_29
, COLUMN_30::
TIMESTAMP AS COLUMN_30
, COLUMN_31
, COLUMN_32
, COLUMN_33
, COLUMN_34
, COLUMN_35
, COLUMN_36
, NULL AS COLUMN_37
, COLUMN_38 AS COLUMN_38
, COLUMN_39
, COLUMN_40
, COLUMN_41
, COLUMN_42
, COLUMN_43
, COLUMN_44
, COLUMN_45
, COLUMN_46
, COLUMN_47
, COLUMN_48
, TRY_CAST(COLUMN_49 AS DATE) AS COLUMN_49
, TRY_CAST(COLUMN_50 AS DATE) AS COLUMN_50
, NULL AS COLUMN_51
, NULL AS COLUMN_52
, COLUMN_101 AS COLUMN_53
, NULL AS COLUMN_54
, COLUMN_102 AS COLUMN_55
, COLUMN_103 AS COLUMN_56
, COLUMN_104 AS COLUMN_57
, NULL AS COLUMN_59
, NULL AS COLUMN_60
, NULL AS COLUMN_61
, NULL AS COLUMN_62
, CAST(COLUMN_63 AS BIGINT) AS COLUMN_63
, COLUMN_64
, COLUMN_65
, NULL AS COLUMN_66
, COLUMN_67
, NULL AS COLUMN_68
, NULL AS COLUMN_69
, COLUMN_70
, NULL AS COLUMN_71
, COLUMN_72
, COLUMN_73
, COLUMN_116 AS COLUMN_74
, NULL AS COLUMN_75
, COLUMN_76
, COLUMN_77
, COLUMN_78
, COLUMN_79
, NULL AS COLUMN_80
, COLUMN_81
, COLUMN_82
, NULL AS COLUMN_83
, NULL AS COLUMN_84
, NULL AS COLUMN_85
, COLUMN_86
, COLUMN_87::
TIMESTAMP AS COLUMN_87
, COLUMN_88
, COLUMN_89::
TIMESTAMP AS COLUMN_89
, COLUMN_117 AS COLUMN_92
, COLUMN_118 AS COLUMN_93
, COLUMN_58
, COLUMN_90
, COLUMN_91
, COLUMN_94
, CASE
WHEN
TABLE_2.COLUMN_7 NOT IN ('alpha', 'epsilon', 'gamma')
THEN
NULL
WHEN
DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_2.COLUMN_6)) = 0
AND TABLE_2.COLUMN_3 = TABLE_3.COLUMN_106
THEN
'delta'
ELSE
'eta'
END AS COLUMN_95
, NULL AS COLUMN_107
, NULL AS COLUMN_108
, NULL AS COLUMN_98
, NULL AS COLUMN_99
FROM DATA_WAREHOUSE.SCHEMA_1.TABLE_7 TABLE_2
LEFT JOIN
(
SELECT
COLUMN_10 AS COLUMN_109
, COLUMN_94 AS COLUMN_110
, COLUMN_3 AS COLUMN_106
, COLUMN_6 AS COLUMN_105
, ROW_NUMBER() OVER(
PARTITION BY
COLUMN_10
, COLUMN_94
ORDER BY
COLUMN_6
) AS RANK
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_7 TABLE_2
WHERE
COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
) TABLE_3
ON
TABLE_3.COLUMN_109 = TABLE_2.COLUMN_10
AND TABLE_3.COLUMN_110 = TABLE_2.COLUMN_94
AND TABLE_3.RANK = 1
AND TABLE_2.COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
LEFT JOIN
(
SELECT
DISTINCT
COLUMN_64 AS COLUMN_111
, COLUMN_3 AS COLUMN_112
, DENSE_RANK() OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_5
) AS COLUMN_86
, LAG(COLUMN_5, 1) OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_5
) AS COLUMN_87
FROM
(
SELECT
DISTINCT
COLUMN_64
, COLUMN_3
, MIN(COLUMN_5) AS COLUMN_5
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_7
WHERE
COLUMN_7 NOT IN ('gamma')
AND COLUMN_64 IS NOT NULL
GROUP BY
1
, 2
)
) TABLE_5
ON
TABLE_2.COLUMN_64 = TABLE_5.COLUMN_111
AND CAST(TABLE_2.COLUMN_3 AS VARCHAR) = CAST(TABLE_5.COLUMN_112 AS VARCHAR)
LEFT JOIN
(
SELECT
DISTINCT
COLUMN_64 AS COLUMN_113
, COLUMN_2 AS COLUMN_114
, DENSE_RANK() OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_6
) AS COLUMN_88
, LAG(COLUMN_6, 1) OVER(
PARTITION BY
COLUMN_64
ORDER BY
COLUMN_6
) AS COLUMN_89
FROM
(
SELECT
DISTINCT
COLUMN_64
, COLUMN_2
, COLUMN_6
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_7
WHERE
COLUMN_7 IN ('alpha', 'epsilon')
AND COLUMN_64 IS NOT NULL
)
) TABLE_6
ON
TABLE_2.COLUMN_64 = TABLE_6.COLUMN_113
AND CAST(TABLE_2.COLUMN_2 AS VARCHAR) = CAST(TABLE_6.COLUMN_114 AS VARCHAR)
UNION ALL
SELECT DISTINCT
COLUMN_1
, CAST(COLUMN_115 AS VARCHAR(256)) AS COLUMN_2
, CAST(COLUMN_3 AS VARCHAR(256)) AS COLUMN_3
, COLUMN_119::
TIMESTAMP AS COLUMN_4
, COLUMN_120::
TIMESTAMP AS COLUMN_5
, COLUMN_121::
TIMESTAMP AS COLUMN_6
, COLUMN_122 AS COLUMN_7
, COLUMN_123 AS COLUMN_8
, COLUMN_124 AS COLUMN_9
, COLUMN_10
, COLUMN_125 AS COLUMN_11
, COLUMN_126 AS COLUMN_12
, COLUMN_126 AS COLUMN_13
, NULL AS COLUMN_14
, COLUMN_15
, COLUMN_16
, NULL AS COLUMN_17
, COLUMN_20
, (COLUMN_126 - COLUMN_20 - COLUMN_15) AS COLUMN_21
, NULL AS COLUMN_22
, COLUMN_23
, NULL AS COLUMN_24
, NULL AS COLUMN_25
, COLUMN_127 AS COLUMN_26
, NULL AS COLUMN_27
, CAST(COLUMN_128 AS VARCHAR(256)) AS COLUMN_28
, NULL AS COLUMN_29
, NULL AS COLUMN_30
, NULL AS COLUMN_31
, NULL AS COLUMN_32
, NULL AS COLUMN_33
, NULL AS COLUMN_34
, NULL AS COLUMN_35
, NULL AS COLUMN_36
, COLUMN_37
, COLUMN_38
, COLUMN_129 AS COLUMN_39
, COLUMN_40
, NULL AS COLUMN_41
, COLUMN_130 AS COLUMN_42
, COLUMN_43
, NULL AS COLUMN_44
, NULL AS COLUMN_45
, NULL AS COLUMN_46
, NULL AS COLUMN_47
, NULL AS COLUMN_48
, NULL AS COLUMN_49
, NULL AS COLUMN_50
, COLUMN_51
, COLUMN_52
, NULL AS COLUMN_53
, COLUMN_131 AS COLUMN_54
, COLUMN_132 AS COLUMN_55
, COLUMN_133 AS COLUMN_56
, COLUMN_134 AS COLUMN_57
, COLUMN_59 AS COLUMN_59
, COLUMN_135 AS COLUMN_60
, COLUMN_61
, NULL AS COLUMN_62
, COLUMN_136 AS COLUMN_63
, NULL AS COLUMN_64
, COLUMN_137 AS COLUMN_65
, NULL AS COLUMN_66
, NULL AS COLUMN_67
, NULL AS COLUMN_68
, NULL AS COLUMN_69
, NULL AS COLUMN_70
, NULL AS COLUMN_71
, NULL AS COLUMN_72
, NULL AS COLUMN_73
, COLUMN_138 AS COLUMN_74
, NULL AS COLUMN_75
, NULL AS COLUMN_76
, NULL AS COLUMN_77
, COLUMN_139 AS COLUMN_78
, NULL AS COLUMN_79
, NULL AS COLUMN_80
, NULL AS COLUMN_81
, NULL AS COLUMN_82
, NULL AS COLUMN_83
, NULL AS COLUMN_84
, NULL AS COLUMN_85
, NULL AS COLUMN_86
, NULL AS COLUMN_87
, NULL AS COLUMN_88
, NULL AS COLUMN_89
, COLUMN_140 AS COLUMN_92
, COLUMN_141 AS COLUMN_93
, COLUMN_58
, COLUMN_90
, COLUMN_91
, COLUMN_94
, CASE
WHEN
TABLE_8.COLUMN_59 NOT IN ('zeta', 'nu')
THEN
NULL
WHEN
DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_8.COLUMN_121)) = 0
AND TABLE_8.COLUMN_3 = TABLE_3.COLUMN_106
THEN
'delta'
ELSE
'eta'
END AS COLUMN_95
, NULL AS COLUMN_107
, NULL AS COLUMN_108
, NULL AS COLUMN_98
, NULL AS COLUMN_99
FROM DATA_WAREHOUSE.SCHEMA_1.TABLE_9 TABLE_8
LEFT JOIN
(
SELECT
DISTINCT
COLUMN_10 AS COLUMN_109
, COLUMN_94 AS COLUMN_110
, COLUMN_3 AS COLUMN_106
, COLUMN_121 AS COLUMN_105
, ROW_NUMBER() OVER(
PARTITION BY
COLUMN_94
, COLUMN_10
ORDER BY
COLUMN_121
) AS RANK
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_9
WHERE
COLUMN_59 IN ('zeta', 'nu')
AND COLUMN_121 IS NOT NULL
) TABLE_3
ON
TABLE_8.COLUMN_10 = TABLE_3.COLUMN_109
AND TABLE_3.RANK = 1
AND TABLE_8.COLUMN_59 IN ('zeta', 'nu')
AND TABLE_8.COLUMN_94 = TABLE_3.COLUMN_110
AND TABLE_8.COLUMN_121 IS NOT NULL
) TABLE_1
UNION ALL
SELECT
DISTINCT
'sigma' AS COLUMN_1
, COLUMN_2
, COLUMN_3
, COLUMN_4::TIMESTAMP AS COLUMN_4
, COLUMN_5::TIMESTAMP AS COLUMN_5
, COLUMN_6::TIMESTAMP AS COLUMN_6
, COLUMN_7
, COLUMN_8
, COLUMN_9
, COLUMN_10
, NULL AS COLUMN_11
, CAST(COLUMN_12 AS DOUBLE) AS COLUMN_12
, CAST(COLUMN_13 AS DOUBLE) AS COLUMN_13
, NULL AS COLUMN_14
, COLUMN_15
, NULL AS COLUMN_16
, NULL AS COLUMN_17
, NULL AS COLUMN_20
, CAST(COLUMN_21 AS DOUBLE) AS COLUMN_21
, NULL AS COLUMN_22
, COLUMN_23
, NULL AS COLUMN_24
, NULL AS COLUMN_25
, COLUMN_26
, NULL AS COLUMN_27
, NULL AS COLUMN_28
, NULL AS COLUMN_29
, COLUMN_30::TIMESTAMP AS COLUMN_30
, NULL AS COLUMN_31
, NULL AS COLUMN_32
, NULL AS COLUMN_33
, NULL AS COLUMN_34
, NULL AS COLUMN_35
, NULL AS COLUMN_36
, NULL AS COLUMN_37
, COLUMN_38
, COLUMN_39
, COLUMN_40
, NULL AS COLUMN_41
, NULL AS COLUMN_42
, COLUMN_43
, NULL AS COLUMN_44
, NULL AS COLUMN_45
, NULL AS COLUMN_46
, COLUMN_142 AS COLUMN_47
, NULL AS COLUMN_48
, COLUMN_49
, COLUMN_50
, NULL AS COLUMN_51
, NULL AS COLUMN_52
, NULL AS COLUMN_53
, NULL AS COLUMN_54
, COLUMN_143 AS COLUMN_55
, NULL AS COLUMN_56
, NULL AS COLUMN_57
, NULL AS COLUMN_58
, NULL AS COLUMN_59
, NULL AS COLUMN_60
, NULL AS COLUMN_61
, NULL AS COLUMN_62
, NULL AS COLUMN_63
, NULL AS COLUMN_64
, NULL AS COLUMN_65
, NULL AS COLUMN_66
, COLUMN_144 AS COLUMN_67
, NULL AS COLUMN_68
, NULL AS COLUMN_69
, NULL AS COLUMN_70
, NULL AS COLUMN_71
, NULL AS COLUMN_72
, NULL AS COLUMN_73
, NULL AS COLUMN_74
, NULL AS COLUMN_75
, NULL AS COLUMN_76
, NULL AS COLUMN_77
, NULL AS COLUMN_78
, NULL AS COLUMN_79
, NULL AS COLUMN_80
, NULL AS COLUMN_81
, NULL AS COLUMN_82
, NULL AS COLUMN_83
, NULL AS COLUMN_84
, NULL AS COLUMN_85
, NULL AS COLUMN_86
, NULL AS COLUMN_87
, NULL AS COLUMN_88
, NULL AS COLUMN_89
, NULL AS COLUMN_90
, NULL AS COLUMN_91
, COLUMN_92 AS COLUMN_92
, NULL AS COLUMN_93
, 'sigma' AS COLUMN_94
, NULL AS COLUMN_95
, NULL AS COLUMN_107
, NULL AS COLUMN_108
, NULL AS COLUMN_98
, NULL AS COLUMN_99
FROM
DATA_WAREHOUSE.SCHEMA_1.TABLE_10
)
SELECT
TABLE_1.*
, CASE
WHEN
(
TABLE_1.COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
OR (
TABLE_1.COLUMN_59 IN ('zeta', 'nu')
AND TABLE_1.COLUMN_6 IS NOT NULL
AND TABLE_1.COLUMN_1 = 'TABLE_8'
AND TABLE_1.COLUMN_74 IS NOT NULL
)
)
THEN
(
CASE
WHEN
DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_1.COLUMN_6)) = 0
AND TABLE_1.COLUMN_3 = TABLE_3.COLUMN_106
THEN
'delta'
ELSE
'eta'
END
)
ELSE
NULL
END AS COLUMN_145
FROM
TABLE_1
LEFT JOIN
(
SELECT
COLUMN_10 AS COLUMN_109
, COLUMN_3 AS COLUMN_106
, COLUMN_6 AS COLUMN_105
, ROW_NUMBER() OVER(
PARTITION BY
COLUMN_10
ORDER BY
COLUMN_6
) AS RANK
FROM
TABLE_1
WHERE
(
COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
OR (
COLUMN_59 IN ('zeta', 'nu')
AND COLUMN_6 IS NOT NULL
AND TABLE_1.COLUMN_1 = 'TABLE_8'
AND TABLE_1.COLUMN_74 IS NOT NULL
)
)
AND TABLE_1.COLUMN_1 != 'sigma'
) TABLE_3
ON
TABLE_3.COLUMN_109 = TABLE_1.COLUMN_10
AND TABLE_3.RANK = 1
AND TABLE_1.COLUMN_1 != 'sigma'
AND (
TABLE_1.COLUMN_7 IN ('alpha', 'epsilon', 'gamma')
OR (
TABLE_1.COLUMN_59 IN ('zeta', 'nu')
AND TABLE_1.COLUMN_6 IS NOT NULL
AND TABLE_1.COLUMN_1 = 'TABLE_8'
AND TABLE_1.COLUMN_74 IS NOT NULL
)
)
;
I don't know if the query is an issue since I am able to view upstream in https://reata.github.io/sqllineage/
The upstream issue is happening for both Snowflake and Databricks.
-- Table 1: Products
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2)
);
-- Insert sample data into Products table
INSERT INTO products (product_name, price)
VALUES
('Product A', 10.99),
('Product B', 5.99),
('Product C', 8.99);
-- Table 2: Orders
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT,
product_id INT,
quantity INT,
order_date DATE
);
-- Insert sample data into Orders table
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES
(1, 1, 2, '2023-05-01'),
(1, 2, 1, '2023-05-03'),
(2, 3, 3, '2023-05-05');
-- Table 3: Customers
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
-- Insert sample data into Customers table
INSERT INTO customers (name, email)
VALUES
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com');
-- Create a view to join the tables
CREATE OR REPLACE VIEW order_details AS
SELECT
orders.order_id,
customers.name AS customer_name,
customers.email,
products.product_name,
orders.quantity,
orders.order_date
FROM
orders
JOIN
customers ON customers.customer_id = orders.customer_id
JOIN
products ON products.product_id = orders.product_id;
Hi, adding postgres db view lineage failure scenario. Query:
CREATE OR REPLACE VIEW public.vw_film_actor_info
AS
SELECT
CONCAT(fa.film_id, '-A:', fa.actor_id) AS film_actor_key,
CONCAT(a.first_name, ' ', a.last_name) AS full_name,
f.title,
f.release_year,
l.name,
f.rating
FROM public.film_actor fa
INNER JOIN public.actor a
ON fa.actor_id = a.actor_id
INNER JOIN public.film f
ON fa.film_id = f.film_id
INNER JOIN language l
ON f.language_id = l.language_id;
Expected Result:
Actual Result/ Lineage from UI:
There is an issue with Snowflake lineage where if there is a SELECT * on a subquery with some joins, the lineage parser fails to pick up the columns from the base tables. The query is added below -
CREATE OR REPLACE TRANSIENT TABLE data_warehouse.schema_1.table_1
COPY GRANTS
AS
select *,
case when column_1='alpha' then column_2
when column_3 in ('95','30') then column_2
when column_3 in ('130','131','132') then 'beta'
when column_3 in ('25','45') then 'gamma'
else coalesce(column_2,column_3) end as column_3_final
,case when column_4<=2 then column_4||' days'
else '9 days+' end as column_4_final
,case when column_5>=0 and column_5 < 6 then '[0,6)'
when column_5>=6 and column_5 < 12 then '[6,12)'
when column_5>=12 and column_5 < 16 then '[12,16)'
when column_5>=16 and column_5 < 20 then '[16,20)'
when column_5>=20 and column_5 < 24 then '[20,24)'
when column_5>=24 and column_5 < 36 then '[24,36)'
when column_5>=36 and column_5 < 48 then '[36,48)'
when column_5>=48 and column_5 < 60 then '[48,60)'
when column_5>=60 and column_5 < 72 then '[60,72)'
when column_5>=72 and column_5 < 96 then '[72,96)'
else '96 hrs+' end as column_5_final
,case when column_14 in ('eta','epsilon') then 'theta' else 'nu' end as source_final
from
(select
t2.column_6
,t2.column_7
,t2.column_8
,dateadd('minute',0,t2.created) as column_9
,t2.column_10
,t2.column_3
,case
when column_3 = '15' then 'a'
end as column_2,
case
when column_3 = '190' then 'alpha'
end as column_1
,t2.column_11
,t2.column_12
,v.column_13
,t2.column_14
,parse_json(column_15):field_1::varchar as field_1
,t3.column_16
,datediff('day',column_16,dateadd('minute',330,current_timestamp)::timestamp) as column_4
,datediff('minute',oh.column_17,dateadd('minute',330,current_timestamp)::timestamp)*1.0/60 as column_5
,oh.column_17
,max(t2.updated) over () as last_updated
from data_warehouse.schema_2.table_2 t2
left join (select column_8,column_3 as column_3_oh,min(created) as column_17 from data_warehouse.schema_2.table_3
where column_3 is not null
and column_8 in (select distinct column_8 from data_warehouse.schema_2.table_2
where column_3 not in ('140','141','142','99','40'))
group by 1,2) oh on t2.column_8=oh.column_8 and t2.column_3=oh.column_3_oh
left join (select column_7,min(created) as column_16 from data_warehouse.schema_2.table_2
where column_7 in (select distinct column_7 from data_warehouse.schema_2.table_2
where column_3 not in ('140','141','142','99','40')
)
group by 1
) t3 on t2.column_7=t3.column_7
left join data_warehouse.schema_5.table_6 v on t2.column_11::varchar=v.column_19::varchar
where
t2.column_3 not in ('140','141','142','99','40')
);
Adding to previous queries, this query is even failing to produce lineage in reata/sqllineage :
CREATE OR REPLACE TRANSIENT TABLE data_warehouse.schema_2.table_1
COPY GRANTS
AS
SELECT DISTINCT
table_2_alias.column_59 as column_1,
table_2_alias.column_2,
coalesce(table_2_alias.column_3,table_22_alias.column_3) as column_3,
coalesce(table_2_alias.column_4, table_22_alias.column_4) as column_4,
column_5,
table_7_alias.column_56 as column_6,
table_6_alias.column_55 as column_7,
column_8,
table_9_alias.column_56 as column_9,
table_8_alias.column_55 as column_10,
column_11,
table_11_alias.column_56 as column_12,
table_10_alias.column_55 as column_13,
column_14,
table_13_alias.column_56 as column_15,
table_12_alias.column_55 as column_16,
column_17,
table_15_alias.column_56 as column_18,
table_14_alias.column_55 as column_19,
column_20,
table_17_alias.column_56 as column_21,
table_16_alias.column_55 as column_22,
table_2_alias.column_58 as column_23,
table_2_alias.column_57 as column_24,
table_22_alias.column_31 as column_25,
table_22_alias.column_58 as column_26,
table_22_alias.column_57 as column_27,
table_19_alias.column_59 as column_28,
table_19_alias.column_58 as column_29,
table_18_alias.column_59 as column_30,
table_18_alias.column_58 as column_31,
table_18_alias.column_57 as column_32,
table_3_alias.column_56 as column_33,
table_4_alias.column_56 as column_34,
table_5_alias.column_56 as column_35,
column_36,
column_37,
column_38,
-- table_21
column_39,
column_40,
column_41,
column_42,
column_43,
column_44
from data_warehouse.schema_1.table_2 table_2_alias
inner join data_warehouse.schema_1.table_22 table_22_alias on table_2_alias.column_2 = table_22_alias.column_59
left join data_warehouse.schema_1.table_3 table_3_alias on table_3_alias.column_59=table_2_alias.column_33_id
left join data_warehouse.schema_1.table_4 table_4_alias on table_4_alias.column_59=table_2_alias.column_34_id
left join data_warehouse.schema_1.table_5 table_5_alias on table_5_alias.column_59=table_2_alias.column_35_id
left join data_warehouse.schema_1.table_6 table_6_alias on table_22_alias.column_59 = table_6_alias.column_2
left join data_warehouse.schema_1.table_7 table_7_alias on table_6_alias.column_5 = table_7_alias.column_59
left join data_warehouse.schema_1.table_8 table_8_alias on table_22_alias.column_59 = table_8_alias.column_2
left join data_warehouse.schema_1.table_9 table_9_alias on table_8_alias.column_8 = table_9_alias.column_59
left join data_warehouse.schema_1.table_10 table_10_alias on table_22_alias.column_59 = table_10_alias.column_2
left join data_warehouse.schema_1.table_11 table_11_alias on table_10_alias.column_11 = table_11_alias.column_59
left join data_warehouse.schema_1.table_12 table_12_alias on table_22_alias.column_59 = table_12_alias.column_2
left join data_warehouse.schema_1.table_13 table_13_alias on table_12_alias.column_14 = table_13_alias.column_59
left join data_warehouse.schema_1.table_14 table_14_alias on table_22_alias.column_59 = table_14_alias.column_2
left join data_warehouse.schema_1.table_15 table_15_alias on table_14_alias.column_17 = table_15_alias.column_59
left join data_warehouse.schema_1.table_16 table_16_alias on table_22_alias.column_59 = table_16_alias.column_2
left join data_warehouse.schema_1.table_17 table_17_alias on table_16_alias.column_20 = table_17_alias.column_59
left join data_warehouse.schema_1.table_18 table_18_alias on table_22_alias.drug_id = table_18_alias.column_59
left join data_warehouse.schema_1.table_19 table_19_alias on table_22_alias.column_28 = table_19_alias.column_59
left join data_warehouse.schema_1.table_20 table_20 on table_20.column_59 = table_2_alias.column_45
left join
(
-- Mapping those table_21 where 3 levels of category are available
select table_22_alias.column_59,column_46 as column_46_alias,column_47,column_48,column_49,column_39,column_40,column_41
from data_warehouse.schema_1.table_22 table_22_alias
inner join
(select c3.column_59 as column_47,
c3.column_50 as column_51,
c2.column_59 as column_48,
c2.column_50 as column_52,
c1.column_59 as column_49,
c1.column_50 as column_53,
c1.column_56 as column_39,
c2.column_56 as column_40,
c3.column_56 as column_41
from data_warehouse.schema_1.table_21 c1
left join data_warehouse.schema_1.table_21 c2 on c2.column_50 = c1.column_59
left join data_warehouse.schema_1.table_21 c3 on c3.column_50 = c2.column_59
where c3.column_50 is not null
) AS X on table_22_alias.column_46 = x.column_47
union
-- Mapping those table_21 where 2 levels of category are available
select table_22_alias.column_59,column_46 as column_46_alias,null as column_47,column_48,column_49,column_39,column_40,
null as column_41
from data_warehouse.schema_1.table_22 table_22_alias
inner join
(select c2.column_59 as column_48,
c2.column_50 as column_52,
c1.column_59 as column_49,
c1.column_50 as column_53,
c1.column_56 as column_39,
c2.column_56 as column_40
from data_warehouse.schema_1.table_21 c1
left join data_warehouse.schema_1.table_21 c2 on c2.column_50 = c1.column_59
where c1.column_50 is null
) AS X on table_22_alias.column_46 = x.column_48
union
-- Mapping all those table_21 where only 1 level of category is available
select table_22_alias.column_59,column_46 as column_46_alias,null as column_47,null as column_48,c.column_59 as column_49,c.column_56 as column_39,
null as column_40,null as column_41
from data_warehouse.schema_1.table_22 table_22_alias
inner join data_warehouse.schema_1.table_21 c on table_22_alias.column_46 = c.column_59
where column_50 is null
) as table_23 on table_22_alias.column_59 = table_23.column_59
left join
( select
table_24.path,
split_part(table_24.path,'.',1) as column_42,
split_part(table_24.path,'.',2) as column_43,
split_part(table_24.path,'.',3) as column_44,
table_25_alias.column_1
from data_warehouse.schema_1.table_24 table_24
inner join data_warehouse.schema_1.table_25 table_25_alias
on table_24.column_54=table_25_alias.column_54
) as ud on ud.column_1 = table_2_alias.column_59
Hello team, I has this sql code (in oracle) was not supported by lineage from query logs, please review it: UPDATE RAW.TABLKE_A MAS SET MAS.ACCURE_INT_LAST_MONTH = (SELECT NVL(SUM(TRANS_AMOUNT), 0) FROM RAW.TABLE_B D WHERE D.POSTED_DT = '2023-07-11')
dialect: sqlserver
CREATE VIEW [dbo].[vDMPrep]
AS
SELECT
pc.EnglishProductCategoryName
,Coalesce(p.ModelName, p.EnglishProductName) AS Model
,c.CustomerKey
,s.SalesTerritoryGroup AS Region
,CASE
WHEN Month(GetDate()) < Month(c.BirthDate)
THEN DateDiff(yy,c.BirthDate,GetDate()) - 1
WHEN Month(GetDate()) = Month(c.BirthDate)
AND Day(GetDate()) < Day(c.BirthDate)
THEN DateDiff(yy,c.BirthDate,GetDate()) - 1
ELSE DateDiff(yy,c.BirthDate,GetDate())
END AS Age
,CASE
WHEN c.YearlyIncome < 40000 THEN 'Low'
WHEN c.YearlyIncome > 60000 THEN 'High'
ELSE c.[EnglishEducation]
-- ELSE 'Moderate'
END AS IncomeGroup
,d.CalendarYear
,d.FiscalYear
,d.MonthNumberOfYear AS Month
,f.SalesOrderNumber AS OrderNumber
,f.SalesOrderLineNumber AS LineNumber
,f.OrderQuantity AS Quantity
,f.ExtendedAmount AS Amount
FROM
dbo.FactInternetSales f
INNER JOIN dbo.DimDate d
ON f.OrderDateKey = d.DateKey
INNER JOIN dbo.DimProduct p
ON f.ProductKey = p.ProductKey
INNER JOIN dbo.DimProductSubcategory psc
ON p.ProductSubcategoryKey = psc.ProductSubcategoryKey
INNER JOIN dbo.DimProductCategory pc
ON psc.ProductCategoryKey = pc.ProductCategoryKey
INNER JOIN dbo.DimCustomer c
ON f.CustomerKey = c.CustomerKey
INNER JOIN dbo.DimGeography g
ON c.GeographyKey = g.GeographyKey
INNER JOIN dbo.DimSalesTerritory s
ON g.SalesTerritoryKey = s.SalesTerritoryKey
dialect: snowflake
UPDATE RDBAPP_CATALOG_VISIBILITIES_HISTORY_STG HST
SET
_END_DATE = NVL(LEAST(HST._END_DATE, SRC._UPDATE_DATE), SRC._UPDATE_DATE),
_IS_CURRENT = IFF(_ACTION = 'DEL', TRUE, FALSE),
_IS_HARD_DELETED = IFF(_ACTION = 'UPD', FALSE, TRUE),
_ROW_UPDATE_DATE = SYSDATE()
FROM
RDBAPP_CATALOG_VISIBILITIES_HISTORY_DIFF_TMP SRC
WHERE
HST.PK = SRC.PK
AND HST._IS_CURRENT
AND NOT _ACTION = 'ADD'
;
dialect: postgres
CREATE VIEW views_api AS SELECT max(t.id) AS id,
t.day,
t.ab_id,
t.ab_group_id,
t.ab_group_name,
t.ab_is_test,
t.hw_type,
false AS feature_hit,
t.metric,
t.type,
sum(t.count) AS value,
a.name,
a.description,
a.t_start,
a.t_end,
a.layer_name,
a.traffic,
a.username
FROM (public.views_clicks_relativni t
LEFT JOIN public.ab_api a ON t.ab_id = a.ab_id)
WHERE t.metric = 'views'
GROUP BY GROUPING SETS (t.day, t.ab_id, t.ab_group_id, t.ab_group_name, t.ab_is_test, t.hw_type, false::boolean, t.metric, t.type, a.name, a.description, a.t_start, a.t_end, a.layer_name, a.traffic, a.username),(t.day, t.ab_id, t.ab_group_id, t.ab_group_name, t.ab_is_test, false::boolean, t.metric, t.type, a.name, a.description, a.t_start, a.t_end, a.layer_name, a.traffic, a.username)
dialect: singlestore
CREATE VIEW vw_test AS
SELECT
a.someColumn ,
b.anotherColumn
FROM a
JOIN b
ON [a.id](http://a.id/) = b.fk_id
Affected module Ingestion Framework
Describe the bug For few of the tables the upstream lineage isn't visible, Wanted to understand how this could happen & what is the fix. Sharing the masked query here
COPY GRANTS
AS
SELECT DISTINCT
'MASKED' AS sensitive_info_1,
'MASKED' AS sensitive_info_2,
'MASKED' AS sensitive_info_3,
'MASKED' AS sensitive_info_4,
'MASKED' AS sensitive_info_5,
'MASKED' AS sensitive_info_6,
'MASKED' AS sensitive_info_7,
'MASKED' AS sensitive_info_8,
'MASKED' AS sensitive_info_9,
CAST('MASKED' AS VARCHAR(25)) AS sensitive_info_10,
'MASKED' AS sensitive_info_11,
CAST('MASKED' AS VARCHAR(25)) AS sensitive_info_12,
'MASKED' AS sensitive_info_13,
'MASKED' AS sensitive_info_14,
COALESCE(DATEDIFF('second','MASKED','MASKED'),0) AS sensitive_info_15,
CASE
WHEN lower('MASKED') IN ('MASKED', 'MASKED') THEN COALESCE(DATEDIFF('second','MASKED','MASKED') - (DATEDIFF('day',DATE('MASKED'),DATE('MASKED'))*43200),0)
ELSE COALESCE(DATEDIFF('second','MASKED','MASKED')-(DATEDIFF('day',DATE('MASKED'),DATE('MASKED'))*50400) -
(CASE
WHEN YEAR('MASKED') = 2020 AND WEEK('MASKED') < WEEK('MASKED') THEN 53 + WEEK('MASKED') - WEEK('MASKED')
WHEN YEAR('MASKED') > 2020 AND WEEK('MASKED') < WEEK('MASKED') THEN 52 + WEEK('MASKED') - WEEK('MASKED')
ELSE WEEK('MASKED') - WEEK('MASKED')
END)*36000,0)
END AS sensitive_info_16,
CASE
WHEN 'MASKED' = 'MASKED' THEN 'MASKED'
ELSE 'NA'
END AS sensitive_info_17,
'MASKED' AS sensitive_info_18,
'MASKED' AS sensitive_info_19,
'MASKED' AS sensitive_info_20,
'MASKED' AS sensitive_info_21,
'MASKED' AS sensitive_info_22,
'MASKED' AS sensitive_info_23,
SUBSTR('MASKED', 1, 50000) AS sensitive_info_24,
'MASKED' AS sensitive_info_25,
'MASKED' AS sensitive_info_26,
'MASKED' AS sensitive_info_27,
CASE
WHEN 'MASKED' = 'MASKED' AND 'MASKED' IN ('MASKED','MASKED','MASKED','MASKED') THEN 'TRUE'
-- ... (Other conditions here)
ELSE 'FALSE'
END AS sensitive_info_28,
HOUR('MASKED') AS sensitive_info_29,
CASE
WHEN 'MASKED' NOT IN ('40','99','142') AND 'MASKED' IS NULL THEN 'TRUE'
ELSE 'FALSE'
END AS sensitive_info_30
FROM
(
SELECT DISTINCT
'MASKED' AS sensitive_info_1,
'MASKED' AS sensitive_info_2,
'MASKED' AS sensitive_info_3,
'MASKED' AS sensitive_info_4,
'MASKED' AS sensitive_info_5,
'MASKED' AS sensitive_info_6,
'MASKED' AS sensitive_info_7,
'MASKED' AS sensitive_info_8,
'MASKED' AS sensitive_info_9,
'MASKED' AS sensitive_info_10,
'MASKED' AS sensitive_info_11,
'MASKED' AS sensitive_info_12,
'MASKED' AS sensitive_info_13,
'MASKED' AS sensitive_info_14,
'MASKED' AS sensitive_info_15,
'MASKED' AS sensitive_info_16,
'MASKED' AS sensitive_info_17,
'MASKED' AS sensitive_info_18,
'MASKED' AS sensitive_info_19,
'MASKED' AS sensitive_info_20,
'MASKED' AS sensitive_info_21,
'MASKED' AS sensitive_info_22,
'MASKED' AS sensitive_info_23,
'MASKED' AS sensitive_info_24,
'MASKED' AS sensitive_info_25,
'MASKED' AS sensitive_info_26,
'MASKED' AS sensitive_info_27
FROM (
SELECT *
from data_warehouse.ctrl_map.order_flows_base
-- ... (masked columns from the inner query)
) AS t
) AS t
Dialect: MSSQL
create view [dbo].[my_second_dbt_model] as
select *
from "test_db"."dbo"."my_first_dbt_model"
where id = 1
Affected Module: Column-Level Lineage with BigQuery metadata + dbt ingestion
Describe the bug: Raising an issue for a missing upstream Column-Level Lineage, it is not being highlighted/connected when clicking a column from the Lineage graph.
Test Case: company_mapping_tmp.company_id is missing its source column companies.dsa_id, based with this line of code in the dbt script below c.dsa_id AS company_id
from the dbt script of company_mapping_tmp below:
WITH fineract_office AS ( SELECT ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) AS rnum , id , ABS(SAFE_CAST(external_id AS INT64)) AS external_id , name , opening_date FROM
data-analytics-254406.
fineract_production.
m_office ) SELECT DISTINCT to_hex(md5(cast(coalesce(cast(dsa_id as string ), '') || '-' || coalesce(cast(lms as string ), '') || '-' || coalesce(cast(lms_dsa_id as string ), '') as string ))) as company_key, c.dsa_id AS nlo , c.dsa_id AS company_id , lms , lms_dsa_id , fo.id AS fin_office_id , COALESCE(c.name, dc.nlo_name) AS nlo_name , lms_name , TIMESTAMP(DATETIME(c.created_at, 'Asia/Manila')) AS created_date , TIMESTAMP(DATETIME(c.updated_at, 'Asia/Manila')) AS updated_date -- follows the date updated in company service , TIMESTAMP(DATETIME(CURRENT_TIMESTAMP(),'Asia/Manila')) AS dbt_created , TIMESTAMP(DATETIME(CURRENT_TIMESTAMP(),'Asia/Manila')) AS dbt_updated , c.country_code FROM (SELECT ROW_NUMBER() OVER(PARTITION BY dsa_id ORDER BY updated_at DESC) AS rnum , * FROM
data-analytics-254406.
uploan_production_mongodb.
companies) c LEFT OUTER JOIN
data-analytics-254406.
uploan_prod.
data_company_mappingdc ON dc.nlo = c.dsa_id LEFT OUTER JOIN fineract_office fo ON dc.nlo = CAST(fo.external_id AS INT64) AND fo.rnum = 1 WHERE c.rnum = 1
Affected Module: Column-Level Lineage with ClickHouse metadata + dbt ingestion
Describe the bug: Raising an issue for a missing upstream Column-Level Lineage, it is not being parsed lineage from upstream table
case:
with inv_data as (select date_id, sku, sales_channel_id, country, efn_flag, amz_field, quantity from dbt.stg_inventory_amz), location as (select location_id, location_group, location, amz_field, sales_channel_id, country from eng.amz_location_map), sku_cost as (select sku, avg_unit_cost from dbt.stg_sku_cost), sku_prod_map as (select sku, product_id from dbt.stg_sku_product_map), prod_cost as (select product_id, avg_unit_cost from dbt.stg_product_cost) select a.date_id as date_id, a.sku as sku, b.location_group as location_group, b.location as location, b.location_id as location_id, d.product_id as product_id, a.quantity as quantity, ifNull(c.avg_unit_cost, e.avg_unit_cost) as avg_cost_usd, efn_flag as efn_flag from inv_data a left join location b on (a.country = b.country and a.amz_field = b.amz_field and a.sales_channel_id = b.sales_channel_id) left join sku_cost c on (a.sku = c.sku) left join sku_prod_map d on (a.sku = d.sku) left join prod_cost e on (d.product_id = e.product_id)
debug log:
[2023-11-10T13:38:03.342+0000] {parser.py:420} DEBUG - Lineage computed with SqlFluff did not perform as expected for the [clickhouse] query: [create table atlas.dbt.int_inventory_juvo as with fb_inv as (select date_id, sku, location_group, location, quantity from dbt.stg_inventory_juvo), location as (select location_group, location, location_id from dbt.stg_location), sku_cost as (select sku, avg_unit_cost from dbt.stg_sku_cost), sku_prod_map as (select sku, product_id from dbt.stg_sku_product_map), prod_cost as (select product_id, avg_unit_cost from dbt.stg_product_cost) select a.date_id as date_id, a.sku as sku, a.location_group as location_group, a.location as location, b.location_id as location_id, d.product_id as product_id, a.quantity as quantity, ifNull(c.avg_unit_cost, e.avg_unit_cost) as avg_cost_usd, 'No' as efn_flag from fb_inv a left join location b on (a.location_group = b.location_group and a.location = b.location) left join sku_cost c on (a.sku = c.sku) left join sku_prod_map d on (a.sku = d.sku) left join prod_cost e on (d.product_id = e.product_id)]
Affected Module: Column-Level Lineage with PostgreSQL 15.4
Describe the bug: While ingesting the following query getting an error as ValueError: too many values to unpack during handling of column level lineage
Here the logs from ingestion
0 matches
0af5b390a027
*** Found local files:
*** * /opt/airflow/logs/dag_id=Translation_lineage_xMW3W7KN/run_id=manual__2023-11-17T13:15:41+00:00/task_id=lineage_task/attempt=1.log
[2023-11-17T13:15:42.644+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [queued]>
[2023-11-17T13:15:42.742+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [queued]>
[2023-11-17T13:15:42.743+0000] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
[2023-11-17T13:15:42.749+0000] {taskinstance.py:1327} INFO - Executing <Task(PythonOperator): lineage_task> on 2023-11-17 13:15:41+00:00
[2023-11-17T13:15:42.752+0000] {standard_task_runner.py:57} INFO - Started process 25767 to run task
[2023-11-17T13:15:42.754+0000] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'Translation_lineage_xMW3W7KN', 'lineage_task', 'manual__2023-11-17T13:15:41+00:00', '--job-id', '34', '--raw', '--subdir', 'DAGS_FOLDER/Translation_lineage_xMW3W7KN.py', '--cfg-path', '/tmp/tmps9onqu2l']
[2023-11-17T13:15:42.755+0000] {standard_task_runner.py:85} INFO - Job 34: Subtask lineage_task
[2023-11-17T13:15:42.795+0000] {task_command.py:410} INFO - Running <TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [running]> on host 0af5b390a027
[2023-11-17T13:15:43.019+0000] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='admin' AIRFLOW_CTX_DAG_ID='Translation_lineage_xMW3W7KN' AIRFLOW_CTX_TASK_ID='lineage_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-17T13:15:41+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-17T13:15:41+00:00'
[2023-11-17T13:15:43.023+0000] {server_mixin.py:63} DEBUG - Validating client and server versions
[2023-11-17T13:15:43.042+0000] {ingestion_pipeline_mixin.py:51} DEBUG - Created Pipeline Status for pipeline Translation.Translation_lineage_xMW3W7KN: {'eventType': 'entityUpdated', 'entityType': 'ingestionPipeline', 'entityId': '1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'entityFullyQualifiedName': 'Translation.Translation_lineage_xMW3W7KN', 'previousVersion': 0.4, 'currentVersion': 0.4, 'userName': 'admin', 'timestamp': 1700226943041, 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'pipelineStatus', 'newValue': {'runId': 'c75fa4a6-2b6f-43d6-9e8e-7f9b595bb10a', 'pipelineState': 'running', 'startDate': 1700226943022, 'timestamp': 1700226943022}}], 'fieldsDeleted': [], 'previousVersion': 0.4}, 'entity': {'id': '1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'name': 'Translation_lineage_xMW3W7KN', 'displayName': 'Translation_lineage_xMW3W7KN', 'pipelineType': 'lineage', 'fullyQualifiedName': 'Translation.Translation_lineage_xMW3W7KN', 'sourceConfig': {'config': {'type': 'DatabaseLineage', 'resultLimit': 1000, 'queryLogDuration': 1, 'tableFilterPattern': {'excludes': [], 'includes': []}, 'parsingTimeoutLimit': 300, 'schemaFilterPattern': {'excludes': [], 'includes': []}, 'databaseFilterPattern': {'excludes': [], 'includes': []}}}, 'airflowConfig': {'pausePipeline': False, 'concurrency': 1, 'startDate': 1700092800000, 'pipelineTimezone': 'UTC', 'retries': 0, 'retryDelay': 300, 'pipelineCatchup': False, 'scheduleInterval': '0 0 * * *', 'maxActiveRuns': 1, 'workflowDefaultView': 'tree', 'workflowDefaultViewOrientation': 'LR'}, 'loggerLevel': 'DEBUG', 'deployed': True, 'enabled': True, 'href': 'http://openmetadata-server:8585/api/v1/services/ingestionPipelines/1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'version': 0.4, 'updatedAt': 1700226926350, 'updatedBy': 'admin', 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'deployed', 'oldValue': False, 'newValue': True}], 'fieldsDeleted': [], 'previousVersion': 0.3}, 'deleted': False, 'provider': 'user'}}
[2023-11-17T13:15:43.052+0000] {metadata.py:55} DEBUG - Source type:postgres-lineage,<class 'metadata.ingestion.source.database.postgres.lineage.PostgresLineageSource'> configured
[2023-11-17T13:15:43.052+0000] {metadata.py:57} DEBUG - Source type:postgres-lineage,<class 'metadata.ingestion.source.database.postgres.lineage.PostgresLineageSource'> prepared
[2023-11-17T13:15:43.062+0000] {metadata.py:66} DEBUG - Sink type:metadata-rest, <class 'metadata.ingestion.sink.metadata_rest.MetadataRestSink'> configured
[2023-11-17T13:15:43.767+0000] {sql_lineage.py:386} DEBUG - Running lineage with query: CREATE TABLE translation.entrees
(
id public.OBJECT_ID,
legacy_id public.LEGACY_ID,
agency_legacy_id public.LEGACY_ID,
building_legacy_id public.LEGACY_ID,
address_1 CHARACTER VARYING(40),
address_2 CHARACTER VARYING(40),
additional_address_info CHARACTER VARYING(30),
city CHARACTER VARYING(40),
zip_code CHARACTER VARYING(15),
country_code TEXT,
emitter_bip_1 CHARACTER VARYING(20),
emitter_bip_2 CHARACTER VARYING(20),
pinpads CHARACTER VARYING(12),
pinpads_date TIMESTAMP WITHOUT TIME ZONE,
pinpads_specific CHARACTER VARYING(30),
desk_rattachment_supplier_legacy_id public.LEGACY_ID,
perception_supplier_legacy_id public.LEGACY_ID,
tax_center_supplier_legacy_id public.LEGACY_ID,
sub_building_number BIGINT,
created_at TIMESTAMP WITHOUT TIME ZONE,
updated_at TIMESTAMP WITHOUT TIME ZONE,
migration_legacy_id TEXT
)
[2023-11-17T13:15:44.062+0000] {sql_lineage.py:386} DEBUG - Running lineage with query: COPY (
SELECT
(jsonb_build_object(
'createdAt', to_date(now()::TIMESTAMP),
'updatedAt', to_date(now()::TIMESTAMP)
) || jsonb_strip_nulls(to_jsonb(t)))::TEXT
FROM
ft_staging.units t) TO STDOUT
[2023-11-17T13:15:44.063+0000] {parser.py:399} DEBUG - Lineage with SqlFluff failed for the [postgres] query: [None]
[2023-11-17T13:15:44.065+0000] {status.py:65} WARNING - Ingesting lineage failed for service [Translation]: 'NoneType' object has no attribute 'strip'
[2023-11-17T13:15:44.066+0000] {status.py:66} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 389, in get_lineage_by_query
dialect: mssql
IF OBJECT_ID(N'REPORTING.[dbo].[AggregateHTSEntrypoint]', N'U') IS NOT NULL
DROP TABLE REPORTING.[dbo].[AggregateHTSEntrypoint];
SELECT DISTINCT
MFLCode,
f.FacilityName,
County,
SubCounty,
p.PartnerName,
a.AgencyName,
Gender,
age.DATIMAgeGroup as AgeGroup,
EntryPoint,
year,
month,
FORMAT(cast(date as date), 'MMMM') MonthName,
EOMONTH(d.date) as AsofDate,
Sum(Tested) Tested,
Sum(Positive) Positive,
Sum(Linked) Linked,
CAST(GETDATE() AS DATE) AS LoadDate
INTO REPORTING.[dbo].[AggregateHTSEntrypoint]
FROM NDWH.dbo.FactHTSClientTests hts
LEFT join NDWH.dbo.DimFacility f on f.FacilityKey = hts.FacilityKey
LEFT JOIN NDWH.dbo.DimAgency a on a.AgencyKey = hts.AgencyKey
LEFT JOIN NDWH.dbo.DimPatient pat on pat.PatientKey = hts.PatientKey
LEFT join NDWH.dbo.DimAgeGroup age on age.AgeGroupKey=hts.AgeGroupKey
LEFT JOIN NDWH.dbo.DimPartner p on p.PartnerKey = hts.PartnerKey
LEFT JOIN NDWH.dbo.FactHTSClientLinkages link on link.PatientKey = hts.PatientKey
LEFT JOIN NDWH.dbo.DimDate d on d.DateKey = hts.DateTestedKey
where TestType in ('Initial test','Initial')
GROUP BY
MFLCode,
f.FacilityName,
County,
SubCounty,
p.PartnerName,
a.AgencyName,
Gender,
age.DATIMAgeGroup,
EntryPoint,
year,
month,
FORMAT(cast(date as date), 'MMMM'),
EOMONTH(d.date);
dialect: mssql
[2023-12-18T17:06:40.387+0000] {status.py:65} WARNING - Error processing query [DECLARE @edition sysname;
SET @edition = cast(SERVERPROPERTY(N'EDITION') as sysname);
SELECT case when @edition = N'SQL Azure' then 2 else 1 end as 'DatabaseEngineType',
SERVERPROPERTY('EngineEdition') AS DatabaseEngineEdition,
SERVERPROPERTY('ProductVersion') AS ProductVersion,
@@MICROSOFTVERSION AS MicrosoftVersion;
select host_platform from sys.dm_os_host_info
if @edition = N'SQL Azure'
select 'TCP' as ConnectionProtocol
else
exec ('select CONVERT(nvarchar(40),CONNECTIONPROPERTY(''net_transport'')) as ConnectionProtocol')
]: 'NoneType' object has no attribute 'strip'
[2023-12-18T17:06:40.387+0000] {status.py:66} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/processor/query_parser.py", line 98, in _run
parsed_sql = parse_sql_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/processor/query_parser.py", line 53, in parse_sql_statement
if not lineage_parser.involved_tables:
File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 85, in involved_tables
set(self.source_tables)
File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 110, in source_tables
return self.retrieve_tables(self.parser.source_tables)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
self._eval()
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 178, in _eval
self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 178, in <listcomp>
self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 62, in analyze
self._extract_from_dml(stmt, AnalyzerContext())
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 223, in _extract_from_dml
next_handler.handle(sub_token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/base.py", line 37, in handle
self._handle(token, holder)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 59, in _handle
self._handle_column(token)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 106, in _handle_column
self.columns.append(SqlParseColumn.of(token))
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/models.py", line 83, in of
return Column(
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 145, in __init__
self.raw_name = escape_identifier_name(name)
File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 9, in escape_identifier_name
return name.strip("`").strip('"').strip("'")
AttributeError: 'NoneType' object has no attribute 'strip'
dialect: oracle
insert into stats_advisor_filter_obj$
(rule_id, obj#, flags, type)
select :rule_id, obj#, :flag_include, :type_expanded
from stats_advisor_filter_obj$
where type = :type_priv
and (bitand(flags, :flag_orcl_owned) = 0 or :get_orcl_objects = 'T')
and obj# in
(select obj#
from stats_advisor_filter_obj$
where rule_id = :rule_id and type = :type_user
and bitand(flags, :flag_include) != 0)
insert into WRH$_IM_SEG_STAT (dbid, per_pdb, con_dbid, snap_id, instance_number, ts#, dataobj#, obj#, membytes, scans, scans_delta, db_block_changes, db_block_changes_delta, populate_cus, populate_cus_delta, repopulate_cus, repopulate_cus_delta ) select :dbid, 0 per_pdb, :srcdbid con_dbid, :snap_id, :instance_number, tsn_kewrimseg, dataobj_kewrimseg, obj_kewrimseg, membytes_kewrimseg, scans_kewrimseg, scans_dl_kewrimseg, db_block_changes_kewrimseg, db_block_changes_dl_kewrimseg, populate_kewrimseg, populate_dl_kewrimseg, repopulate_kewrimseg, repopulate_dl_kewrimseg FROM X$KEWRIMSEGSTAT ims WHERE (1 = 1)
dialect: snowflake
identify column transformation logic emp_value, emp_key
INSERT INTO EDW.INTEGRATION.Employee
SELECT
emp_id,
"'emp_name'" as EMP_NAME,
"'emp_title'" as EMP_TITLE
FROM EDW.STAGING.Employee
PIVOT(
MAX(emp_value)
FOR emp_key IN ('emp_name','emp_title')
)
ORDER BY emp_id;
dialect: redshift
MERGE INTO om_test.table_consolidated11 USING om_test.table_consolidated6 ON om_test.table_consolidated11.job_id = om_test.table_consolidated6.job_id\nWHEN MATCHED THEN UPDATE SET job_id = om_test.table_consolidated6.job_id, code = om_test.table_consolidated6.code,job_status = om_test.table_consolidated6.job_status\nWHEN NOT MATCHED THEN INSERT VALUES (om_test.table_consolidated6.job_id,om_test.table_consolidated6.code,om_test.table_consolidated6.job_status)
dialect: redshift context: https://github.com/open-metadata/OpenMetadata/issues/15427
create materialized view test_schema.sales_current2 as (
WITH current_or_previous AS (
SELECT
MAX(data_release_version) AS data_release_version
FROM metadata_schema.datamart_run
WHERE dag_id = 'test_id'
) select eventid, listid, salesrow from test_schema.sales where salesrow=(select data_release_version from current_or_previous));
dialect: redshift context: https://github.com/open-metadata/OpenMetadata/issues/15423
insert into test_schema.union_all_test
with initial_sub as (select eventid, listid, salesrow
from test_schema.sales),
union_sub_test as (
select eventid, listid, salesrow
from initial_sub
where salesrow='Yes'
union all
select eventid, listid, salesrow
from initial_sub
where salesrow='No')
select eventid, listid, salesrow from union_sub_test where listid in(500,501,502);
dialect: snowflake
create or replace view VW_TEST(
COL1,
COL2
)
WITH ROW ACCESS POLICY POLICYDB.POLICYSCHEMA.POLICYNAME ON (COL2)
as (
with
-- Import CTEs
TEST_TABLE as (
select * from DB.SCHEMA.PARENT_TABLE
)
, final as (
select
COL1,
COL2
from TEST_TABLE
)
select * from final
);
[2024-03-20T11:15:37.230+0000] {sql_lineage.py:385} DEBUG - Running lineage with query: CREATE VIEW [dbo].[An_View] AS
SELECT DISTINCT
ColumnA,
ValueLiteral1 AS AliasIDType,
'TypeLiteral1' AS AliasType,
ColumnB,
CASE
WHEN ColumnCondition1 = 'Value1' THEN
(SELECT SubColumn1 FROM SubTable1 WHERE IndustryGroupCondition = ColumnCondition2)
WHEN ColumnCondition1 = 'Value2' THEN 'ValueLiteral2'
ELSE 'ValueLiteral3'
END AS AliasAssetClass,
CASE
WHEN RankCondition = 'Value3' AND CollatCondition = 'Value4' THEN
(SELECT Description FROM SubTable2 WHERE NameCondition = RankCondition)
ELSE
(SELECT Description FROM SubTable2 WHERE NameCondition = CollatCondition)
END AS AliasAssetRank,
MainTable.AliasCountryCode
FROM MainTable
WHERE TimestampCondition = (
SELECT MAX(InnerTimestamp) FROM MainTable AS InnerTable WHERE InnerTable.ColumnMatch = MainTable.ColumnMatch
)
AND MainTable.ColumnA NOT IN (
SELECT ExclusionColumn FROM ExclusionTable WHERE MarketBookCondition AND CodeCondition <> 'ExclusionValue'
)
AND MainTable.ColumnA NOT IN (
SELECT InnerColumnA FROM SubTable3 WHERE InnerColumnMatch = MainTable.ColumnMatch AND InnerTimestamp > MainTable.TimestampCondition
)
AND MainTable.ColumnA <> 'SpecificExclusion'
UNION
SELECT DISTINCT
ColumnA,
ValueLiteral4 AS AliasIDType,
'TypeLiteral2' AS AliasType,
ColumnC,
CASE
WHEN RankCondition = 'Value3' AND CollatCondition = 'Value6' THEN
(SELECT Description FROM SubTable4 WHERE NameCondition = ColumnD)
ELSE
(SELECT Description FROM SubTable4 WHERE NameCondition = CollatCondition)
END AS AliasAssetRank,
FROM SubTable5
WHERE RequestDateCondition = (
SELECT MAX(InnerRequestDate) FROM SubTable5 AS InnerTable2 WHERE InnerTable2.ColumnMatch = SubTable5.ColumnMatch
)
AND SubTable5.ColumnA NOT IN (
SELECT ExclusionColumn FROM ExclusionTable WHERE MarketBookCondition2
)
UNION
SELECT DISTINCT
ExclusionTable.Code AS ColumnA,
ValueLiteral7 AS AliasIDType,
'TypeLiteral3' AS AliasType,
ExclusionTable.BondsName,
ExclusionTable.IssueAmount,
ExclusionTable.CurrenciesShortname,
ExclusionTable.CouponRate,
Countries.AliasCountryShortName
FROM ExclusionTable
INNER JOIN BondsTable ON BondsCondition
WHERE ExclusionTable.MarketBook = LiteralValue8
AND ExclusionTable.Code <> 'ExclusionValue'
AND ExclusionTable.Code NOT IN (
SELECT DISTINCT ColumnA FROM MainTable
UNION
SELECT DISTINCT ColumnA FROM SubTable5
);
11:15:37.560+0000] {parser.py:439} DEBUG - Lineage with SqlFluff failed for the [tsql] query: [CREATE view dbo.AllCWDT as
(the same query again)
I also get the following error for some other queries:
Lineage computed with SqlFluff did not perform as expected for the [tsql] query: [CREATE view Vw_table
These errors appear multiple times in the metadata logs. And only the lineage of view tables is present, and the lineage of regular tables is missing.
I've also observed that when a view's query employs a UNION, the lineage mapping (when it works) includes the entire query. However, if the query utilizes a JOIN that involves a subquery, the lineage is only mapped to that subquery. For example, with this query, we have the lineage for all the views that are mentioned in the query
CREATE VIEW [dbo].[vw_ARSV] AS
SELECT
Column1,
Column2,
Column3,
SUM(Column4),
IIF(SUM(Column5) > 0, SUM(Column5), 0) AS POS,
IIF(SUM(Column5) < 0, SUM(Column5), 0) AS FX
FROM Table1 AS t1
GROUP BY
Column1,
Column2,
Column3
UNION ALL
SELECT
Column1_2,
Column2_2,
Column3_2,
SUM(Column4_2),
IIF(SUM(Column5_2) > 0, SUM(Column5_2), 0) AS CMV,
IIF(SUM(Column5_2) < 0, SUM(Column5_2), 0) AS CMVN
FROM Table2 AS t2
GROUP BY
Column1_2,
Column2_2,
Column3_2
UNION ALL
SELECT
Column1_3,
Column2_3,
Column3_3,
SUM(Column4_3),
IIF(SUM(Column5_3) > 0, SUM(Column5_3), 0) AS ABC,
IIF(SUM(Column5_3) < 0, SUM(Column5_3), 0) AS CBA
FROM Table3 AS t3
GROUP BY
Column1_3,
Column2_3,
Column3_3;
But for this query, the lineage only shows the table Table3 but not the Table1 table. It should show both. Is that related to the fact that we are using sub-queries?
CREATE VIEW [dbo].[AnFXRiskView] AS
SELECT
'S' AS Rubric,
Column1,
Column2,
IIF(Column3 <= 90, 'Up to 3 months', 'More than 3 months') AS TT,
NULL AS Placeholder1,
'Long' AS PST,
Column4,
NULL AS Placeholder2
FROM [DatabaseSchema].[dbo].[Table1]
UNION ALL
SELECT
'T' AS Rubric,
Column5,
Column6,
IIF(Column7 <= 90, 'Up to 3 months', 'More than 3 months') AS TT,
Column8,
Column9,
CalculatedColumn10,
CalculatedColumn11,
CalculatedColumn12,
CalculatedColumn13
FROM [DatabaseSchema].[dbo].[Table2] AS T
LEFT JOIN
(SELECT DISTINCT
Column14,
Column15,
Column16,
Column17,
Column18,
Column19,
Column20,
Column21,
Column22
FROM [DatabaseSchema].[dbo].[Table3]) AS P
ON
P.Column14 = T.Column14
AND P.Column15 = T.Column15
AND P.Column20 = T.TermType
AND P.Column16 = T.Column16
AND P.Column17 = T.Column17;
``` from @manuelateles
create view my_example_table.pg_stat_statements as SELECT pg_stat_statements.userid, pg stat statements.dbid, pg_stat_statements.toplevel, pg stat statements.guervid, pg_stat_statements.query. pg_stat_statements plans,
pg_ stat statements total plan time, pg_stat_statements.min plan time, pg. stat_statements.max_plan_time, pg_statstatements.mean plan time, pg stat statements.stddev_plan time, pg_stat_statements.calls,
pg stat statements total exec time, pg_stat statements.min exec_time, pg stat_statements.max exec_time, pg_ stat statements mean exec time, pgstatstatements stddey exec time.
pg_stat_statements.rows,
pg_ stat statements.sharedolks. hit, pg stat_statements,shared_olks_read, pgstat statements shared_oks dirtied, pg stat_statements,shared olks written,
Rg stat statements.local blks hit, pg.stat statements.locaL.olks_read, pg_statstatements.local_blks dirtied,
pg_stat_statements.local_blks written, pg stat_statements.temp. olks read, pg. stat_statements.temp olks.written, pg_stat statements.olk read_time, pg stat statements.olk write time, pg_stat_statements.wal_records, pg. stat statements.wal fpi, pg_stat_statements.wal_bytes
FROM pg. stat statements(true) pg stat statements(userid, bid, toplevel, queryid, query, plans, total_plan_time, min_plan_time, max_plan_time, mean_plan_time, stddey_plan time, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddey_exectime, rows, shared._blks hit, shared blks_read, shared blks dirtied, shared blks written, local biks hit, local blks read, local blks dirtied,
local bikS written, temp biks.read, temp blks written, blk read time, blk write time, wal_records, wal_fpi, wat_bytes);
https://openmetadata.slack.com/archives/C02B6955S4S/p1713875131637769 dialect: Postgres
dialect: oracle
CREATE MATERIALIZED VIEW OPENMETADATA.DemoView AS
SELECT * FROM OPENMETADATA.EMPLOYEES
WHERE SALARY>60000;
dialect:mssql
dialect: teradata
replace view SQLJ.JARS
AS
select distinct DBC.JARS.DatabaseId,
DBC.JARS.JarName,
DBC.JARS.JarId
from DBC.JARS, DBC.DBASE
where
DBASE.DatabaseNameI = database
and
DBC.JARS.DatabaseId = DBASE.DatabaseId
;
Looks lik REPLACE VIEW
doesn't processed correctly. Teradata's REPLACE VIEW
is an alternative to MySql CREATE OR REPLACE VIEW
dialect: teradata
CREATE VIEW DBC.ZonesVX
AS
SELECT Zones.ZoneName (NAMED ZoneName),
DB1.DatabaseName (NAMED RootName),
Zones.RootType (NAMED RootType),
DB2.DatabaseName (NAMED ZoneDBAName),
DB3.DatabaseName (NAMED CreatorName),
Zones.CreateTimeStamp (NAMED CreateTimeStamp)
FROM DBC.Zones
LEFT OUTER JOIN DBC.Dbase DB1
ON DBC.Zones.ZoneRootID = DB1.DatabaseID
LEFT OUTER JOIN DBC.Dbase DB2
ON DBC.Zones.ZoneDBAId = DB2.DatabaseID
LEFT OUTER JOIN DBC.Dbase DB3
ON DBC.Zones.CreateUID = DB3.DatabaseID
WHERE DB3.DatabaseId = TD_AUTHID
WITH CHECK OPTION;
Error: Found unparsable section: 'WITH CHECK OPTION'
dialect: teradata
CREATE VIEW dbc.test_v AS
LOCKING TABLE dbc.dbcinfo FOR ACCESS
SELECT * FROM dbc.dbcinfo;
Probably section LOCKING TABLE ... FOR ACCESS
is unparsable.
dialect: mssql
view definition holdings denorm.txt
(nolock)
is unparsable
also the column level lineage is not detected in OpenMetadata due to []
dialect: Postgresql (16)
+-----------------------------------------+-------------------------------------------------+
| Query | Error |
+=========================================+=================================================+
| create table minidemo.default.sample as | Lineage with SqlFluff failed for the [postgres] |
+-----------------------------------------+-------------------------------------------------+
| create table minidemo.demo.sample as | Lineage with SqlFluff failed for the [postgres] |
+-----------------------------------------+-------------------------------------------------+
dialect: Postgres
SET client_min_messages=notice
ALTER SEQUENCE public.copro_propositions_id_seq OWNED BY public.copro_propositions.id
ALTER SEQUENCE public.admin_answers_id_seq OWNED BY public.admin_answers.id
SET standard_conforming_strings = on
show timezone
CREATE SEQUENCE public.group_logs_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1
https://openmetadata.slack.com/archives/C02B6955S4S/p1719205908885939
Affected module Ingestion Framework
Describe the bug While ingesting the following query getting an error as
ValueError: too many values to unpack
during handling of column level lineageValueError: too many values to unpack
during handling of column level lineage