open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.28k stars 997 forks source link

Lineage Query Parsing issues #7427

Open ulixius9 opened 2 years ago

ulixius9 commented 2 years ago

Affected module Ingestion Framework

Describe the bug While ingesting the following query getting an error as ValueError: too many values to unpack during handling of column level lineage

pmbrull commented 2 years ago

marking this as P1 for now, we need to gather more info on all queries that work or do not work. Also let's test these things after upgrading sqllineage to latest

ulixius9 commented 1 year ago
Cannot extract source table information from query [select ind_head.indexrelid index_id,
       k col_idx,
       k <= indnkeyatts /* true */ in_key,
       ind_head.indkey[k-$3] column_position,
       ind_head.indoption[k-$4] column_options,
       ind_head.indcollation[k-$5] /* null */ as collation,
       collname /* null */ as collation_str,
       ind_head.indclass[k-$6] as opclass,
       case when opcdefault then $7 else opcname end as opclass_str,
       case
           when indexprs is null then $8
           when ind_head.indkey[k-$9] = $10 then chr($11) || pg_catalog.pg_get_indexdef(ind_head.indexrelid, k::int, $12)
           else pg_catalog.pg_get_indexdef(ind_head.indexrelid, k::int, $13)
       end as expression,
       amcanorder can_order
from pg_catalog.pg_index /* (select *, pg_catalog.generate_subscripts(indkey::int[], 1) + 1 k from pg_catalog.pg_index) */ ind_head
         join pg_catalog.pg_class ind_stor
              on ind_stor.oid = ind_head.indexrelid
cross join unnest(ind_head.indkey) with ordinality u(u, k)
left join pg_catalog.pg_collation
on pg_collation.oid = ind_head.indcollation[k-$14]
cross join pg_catalog.pg_indexam_has_property(ind_stor.relam, $15) amcanorder /* left join pg_catalog.pg_am am on ind_stor.relam = am.oid*/
         left join pg_catalog.pg_opclass
                   on pg_opclass.oid = ind_head.indclass[k-$16]
where ind_stor.relnamespace = $1::oid
  and ind_stor.relkind in ($17, $18)
and pg_catalog.age(ind_stor.xmin) <= coalesce(nullif(greatest(pg_catalog.age($2::varchar::xid), $19), $20), $21)
order by index_id, k]: An Identifier is expected, got Function[value: unnest(ind_head.indkey)] instead.

ref: https://openmetadata.slack.com/archives/C02B6955S4S/p1664877683642579

OnkarVO7 commented 1 year ago

Adding the Query from DBT here which cannot be parsed. Dialect: Redshift

with customers as (

    select * from TESTDB.dbt_jaffle.stg_customers

),

orders as (

    select * from TESTDB.dbt_jaffle.stg_orders

),

payments as (

    select * from TESTDB.dbt_jaffle.stg_payments

),

customer_orders as (

        select
        customer_id,

        min(order_date) as first_order,
        max(order_date) as most_recent_order,
        count(order_id) as number_of_orders
    from orders

    group by customer_id

),

customer_payments as (

    select
        orders.customer_id,
        sum(amount) as total_amount

    from payments

    left join orders on
         payments.order_id = orders.order_id

    group by orders.customer_id

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value

    from customers

    left join customer_orders
        on customers.customer_id = customer_orders.customer_id

    left join customer_payments
        on  customers.customer_id = customer_payments.customer_id

)

select * from final
pmbrull commented 1 year ago

It also brings another trace at:

[2022-12-19 17:34:18,335] {sql_lineage.py:424} ERROR - Ingesting lineage failed for service [Migration]: An Identifier is expected, got IdentifierList[value: description=$1, criticality=$4, cases=$5, is_valid=$6] instead.
[2022-12-19 17:34:18,335] {sql_lineage.py:380} DEBUG - Running lineage with query: INSERT INTO staging.migration_infos
        (id, pipeline_url, pipeline_id, agency_fk, extraction_date, transformation_date, mongo_dump, instance_id)
        SELECT $1, $2, $3, $4::public.OBJECT_ID, $5::TIMESTAMP, $6::TIMESTAMP, $7, $8
    ON CONFLICT (pipeline_id) DO UPDATE
        SET id = $1, pipeline_url = $2, extraction_date = $5, transformation_date = $6::TIMESTAMP, mongo_dump = $7, instance_id = $8
[2022-12-19 17:34:18,341] {sql_lineage.py:423} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 383, in get_lineage_by_query
    raw_column_lineage = result.get_column_lineage()
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
    self._eval()
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in _eval
    self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in <listcomp>
    self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 60, in analyze
    self._extract_from_dml(stmt, AnalyzerContext())
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 135, in _extract_from_dml
    next_handler.handle(sub_token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/base.py", line 39, in handle
    self._handle(token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/source.py", line 58, in _handle
    self._handle_column(token)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/source.py", line 105, in _handle_column
    self.columns.append(Column.of(token))
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 230, in of
    return Column(
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 174, in __init__
    self.raw_name = escape_identifier_name(name)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 8, in escape_identifier_name
    return name.strip("`").strip('"').strip("'")
AttributeError: 'NoneType' object has no attribute 'strip'

-- comment

this query will not be picket up anymore by https://github.com/open-metadata/OpenMetadata/pull/11457

mapler commented 1 year ago

From https://openmetadata.slack.com/archives/C02B6955S4S/p1672892210654719 sample of error query (masked some keyword)

/* BigQuery */
MERGE `project_id.dataset_name.table_name`
USING (
    SELECT tab, DATE(open_at) AS date, tab_id, full_name, short_name, tab_type,
        COUNT(*) AS opens, COUNTIF(os = 'a') AS opens_a, COUNTIF(os != 'a') AS opens_non_a,
    FROM `project_id.dataset_name.table_name_b`
    WHERE DATE(open_at) BETWEEN '2022-12-31' AND '2023-01-03'
    GROUP BY 1, 2, 3, 4, 5, 6
) s
ON FALSE
WHEN NOT MATCHED AND date BETWEEN '2022-12-31' AND '2023-01-03' THEN
  INSERT VALUES (s.tab, s.date, s.tab_id, s.full_name, s.short_name, s.tab_type, s.opens, s.opens_a, s.opens_non_a)
WHEN NOT MATCHED BY SOURCE AND date BETWEEN '2022-12-31' AND '2023-01-03' THEN
  DELETE

Error Message:

[2023-01-05 04:09:45,281] {sql_lineage.py:370} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 330, in get_lineage_by_query
    raw_column_lineage = lineage_parser.column_lineage
  File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 111, in column_lineage
    return self.parser.get_column_lineage()
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
    self._eval()
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in _eval
    self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 179, in <listcomp>
    self._stmt_holders = [LineageAnalyzer().analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 60, in analyze
    self._extract_from_dml(stmt, AnalyzerContext())
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/analyzer.py", line 135, in _extract_from_dml
    next_handler.handle(sub_token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/base.py", line 39, in handle
    self._handle(token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/handlers/cte.py", line 31, in _handle
    raise SQLLineageException(
sqllineage.exceptions.SQLLineageException: An Identifier or IdentifierList is expected, got Token[value: (] instead.
[2023-01-05 04:09:45,282] {sql_lineage.py:371} ERROR - Ingesting lineage failed for service [BigQuery]: An Identifier or IdentifierList is expected, got Token[value: (] instead.
nahuelverdugo commented 1 year ago

@mapler @OnkarVO7 @ulixius9 Please, could you add the dialect of those queries? The reference to the Slack message will disappear after 90 days. Thank you

mapler commented 1 year ago

One more failed case in BigQuery

MERGE `project_id.dataset_id.target_table` t
USING (
  WITH base AS (
      SELECT
          date, channel, cnt, user_count, total_user_count,
      FROM `project_id.dataset_id.origin_table`
  )
  SELECT
      date, channel, cnt, total_user_count, SAFE_DIVIDE(cnt, user_count) AS rate
  FROM base
) s
ON t.date = s.date and t.channel = s.channel
WHEN NOT MATCHED THEN
  INSERT ROW
WHEN MATCHED THEN
  UPDATE SET t.total_user_cnt = s.total_user_cnt,
  t.cnt = s.cnt,
  t.rate = s.rate
pmbrull commented 1 year ago

This case WORKS on snowflake

create or replace view TEST_DB.USR.FCT_DEMO (
    ID,
    ACCOUNT_ID,
    F_NAME,
    L_NAME
) as (
SELECT 
ua.id as id,
ua.account_id,
p.first_name as f_name,
p.last_name as l_name
FROM
TEST_DB.SCHEMA.USER_ACCOUNT ua LEFT JOIN TEST_DB.SCHEMA.PERSON p ON ua.id = p.user_id
);

However, although the query looks simple enough, no col-level lineage is generated image

ulixius9 commented 1 year ago

https://sandbox.open-metadata.org/table/MySQL_Test.sql12611281.sql12611281.vRisk_Denorm/lineage

query (mysql) dialect:

CREATE ALGORITHM=UNDEFINED DEFINER=`sql12611281`@`%` SQL SECURITY DEFINER VIEW `sql12611281`.`vRisk_Denorm` AS select `a`.`AsOfDate` AS `AsOfDate`,`a`.`AssetID` AS `AssetID`,`a`.`AssetRate` AS `AssetRate`,`a`.`AssetRateIndex` AS `AssetRateIndex`,`a`.`AssetRateIndexBehavior` AS `AssetRateIndexBehavior`,`a`.`CostPriceSettled` AS `CostPriceSettled`,`a`.`CostPriceTraded` AS `CostPriceTraded`,`a`.`CurrencyType_Identifier` AS `CurrencyType_Identifier`,`a`.`IntRate` AS `IntRate`,`a`.`Issuer_PortfolioCompanyNameNM` AS `Issuer_PortfolioCompanyNameNM`,`a`.`MaturityDate` AS `MaturityDate`,`a`.`ParAmountSettled` AS `ParAmountSettled`,`a`.`ParAmountTraded` AS `ParAmountTraded`,`a`.`Portfolio` AS `Portfolio`,`a`.`Portfolio_EntityId` AS `Portfolio_EntityId`,`a`.`Position_Id` AS `Position_Id`,`a`.`QuantitySettled` AS `QuantitySettled`,`a`.`QuantityTraded` AS `QuantityTraded`,`a`.`SecurityID` AS `SecurityID`,`a`.`SettledvsTraded` AS `SettledvsTraded`,`a`.`UnitsSettled` AS `UnitsSettled`,`a`.`UnitsTraded` AS `UnitsTraded`,`b`.`Source System` AS `Source System`,`b`.`Portfolio_Name` AS `Portfolio_Name`,`b`.`PositionKeyID` AS `PositionKeyID`,`b`.`Name` AS `Name`,`b`.`Credit Spread` AS `Credit Spread`,`b`.`Issuer OAS` AS `Issuer OAS`,`b`.`Idiosyncratic OAS` AS `Idiosyncratic OAS`,`b`.`Probability of Default` AS `Probability of Default`,`b`.`Recovery Rate` AS `Recovery Rate`,`b`.`Weighted Average Life` AS `Weighted Average Life`,`b`.`Theta` AS `Theta`,`b`.`Effective Convexity` AS `Effective Convexity`,`b`.`Effective Duration` AS `Effective Duration`,`b`.`Delta-Adjusted Underlying PV` AS `Delta-Adjusted Underlying PV`,`b`.`Gross Exposure` AS `Gross Exposure`,`b`.`Long Exposure` AS `Long Exposure`,`b`.`Net Exposure` AS `Net Exposure`,`b`.`Short Exposure` AS `Short Exposure`,`b`.`Modified Duration` AS `Modified Duration`,`b`.`ESG_;VERNANCE_SCORE` AS `ESG_;VERNANCE_SCORE`,`b`.`SDG_01_NET_ALIGNMENT_SCORE` AS `SDG_01_NET_ALIGNMENT_SCORE`,`b`.`SDG_02_NET_ALIGNMENT_SCORE` AS `SDG_02_NET_ALIGNMENT_SCORE`,`b`.`SDG_03_NET_ALIGNMENT_SCORE` AS `SDG_03_NET_ALIGNMENT_SCORE`,`b`.`SDG_04_NET_ALIGNMENT_SCORE` AS `SDG_04_NET_ALIGNMENT_SCORE`,`b`.`SDG_05_NET_ALIGNMENT_SCORE` AS `SDG_05_NET_ALIGNMENT_SCORE`,`b`.`SDG_06_NET_ALIGNMENT_SCORE` AS `SDG_06_NET_ALIGNMENT_SCORE`,`b`.`SDG_07_NET_ALIGNMENT_SCORE` AS `SDG_07_NET_ALIGNMENT_SCORE`,`b`.`SDG_08_NET_ALIGNMENT_SCORE` AS `SDG_08_NET_ALIGNMENT_SCORE`,`b`.`SDG_09_NET_ALIGNMENT_SCORE` AS `SDG_09_NET_ALIGNMENT_SCORE`,`b`.`SDG_10_NET_ALIGNMENT_SCORE` AS `SDG_10_NET_ALIGNMENT_SCORE`,`b`.`SDG_11_NET_ALIGNMENT_SCORE` AS `SDG_11_NET_ALIGNMENT_SCORE`,`b`.`SDG_12_NET_ALIGNMENT_SCORE` AS `SDG_12_NET_ALIGNMENT_SCORE`,`b`.`SDG_13_NET_ALIGNMENT_SCORE` AS `SDG_13_NET_ALIGNMENT_SCORE`,`b`.`SDG_14_NET_ALIGNMENT_SCORE` AS `SDG_14_NET_ALIGNMENT_SCORE`,`b`.`SDG_15_NET_ALIGNMENT_SCORE` AS `SDG_15_NET_ALIGNMENT_SCORE`,`b`.`SDG_16_NET_ALIGNMENT_SCORE` AS `SDG_16_NET_ALIGNMENT_SCORE`,`b`.`SDG_17_NET_ALIGNMENT_SCORE` AS `SDG_17_NET_ALIGNMENT_SCORE`,`b`.`CONTROVERSY_SCORE` AS `CONTROVERSY_SCORE`,`b`.`SOCIAL_CONTROVERSY_SCORE` AS `SOCIAL_CONTROVERSY_SCORE`,`b`.`CUSTOMER_CONTROVERSY_SCORE` AS `CUSTOMER_CONTROVERSY_SCORE`,`b`.`ENVIRONMENT_CONTROVERSY_SCORE` AS `ENVIRONMENT_CONTROVERSY_SCORE`,`b`.`;VERNANCE_CONTROVERSY_SCORE` AS `;VERNANCE_CONTROVERSY_SCORE`,`b`.`HUMAN_RIGHTS_CONTROVERSY_SCORE` AS `HUMAN_RIGHTS_CONTROVERSY_SCORE`,`b`.`LABOR_RIGHTS_CONTROVERSY_SCORE` AS `LABOR_RIGHTS_CONTROVERSY_SCORE`,`b`.`CLIMATE_CHANGE_THEME_SCORE` AS `CLIMATE_CHANGE_THEME_SCORE`,`b`.`BUSINESS_ETHICS_THEME_SCORE` AS `BUSINESS_ETHICS_THEME_SCORE`,`b`.`CORPORATE_;V_THEME_SCORE` AS `CORPORATE_;V_THEME_SCORE`,`b`.`ENVIRONMENTAL_OPPS_THEME_SCORE` AS `ENVIRONMENTAL_OPPS_THEME_SCORE`,`b`.`HUMAN_CAPITAL_THEME_SCORE` AS `HUMAN_CAPITAL_THEME_SCORE`,`b`.`NATURAL_RES_USE_THEME_SCORE` AS `NATURAL_RES_USE_THEME_SCORE`,`b`.`WASTE_MGMT_THEME_SCORE` AS `WASTE_MGMT_THEME_SCORE`,`b`.`PRODUCT_SAFETY_THEME_SCORE` AS `PRODUCT_SAFETY_THEME_SCORE`,`b`.`SOCIAL_OPPS_THEME_SCORE` AS `SOCIAL_OPPS_THEME_SCORE`,`b`.`STAKEHOLDER_OPPOSIT_THEME_SCORE` AS `STAKEHOLDER_OPPOSIT_THEME_SCORE`,`b`.`CBN_LCT_SCORE` AS `CBN_LCT_SCORE`,`b`.`PROD_CARB_FTPRNT_SCORE` AS `PROD_CARB_FTPRNT_SCORE`,`b`.`CARBON_EMISSIONS_SCORE` AS `CARBON_EMISSIONS_SCORE`,`b`.`CARBON_EMISSIONS_GHG_MITIGATION_SCORE` AS `CARBON_EMISSIONS_GHG_MITIGATION_SCORE`,`b`.`CARBON_EMISSIONS_SCOPE_1` AS `CARBON_EMISSIONS_SCOPE_1`,`b`.`UNGC_COMPLIANCE` AS `UNGC_COMPLIANCE`,`b`.`CARBON_EMISSIONS_SCOPE_12` AS `CARBON_EMISSIONS_SCOPE_12`,`b`.`CARBON_EMISSIONS_SCOPE_2` AS `CARBON_EMISSIONS_SCOPE_2`,`b`.`CARBON_EMISSIONS_SCOPE_3` AS `CARBON_EMISSIONS_SCOPE_3`,`b`.`CARBON_EMISSIONS_SCOPE123` AS `CARBON_EMISSIONS_SCOPE123`,`b`.`ESG_SCORE` AS `ESG_SCORE`,`b`.`ESG_ENVIRONMENTAL_SCORE` AS `ESG_ENVIRONMENTAL_SCORE`,`b`.`ESG_SOCIAL_SCORE` AS `ESG_SOCIAL_SCORE`,`b`.`Security_ID` AS `Security_ID`,`b`.`_ID` AS `_ID`,`b`.`as_of_date` AS `as_of_date`,`b`.`created_by` AS `created_by`,`b`.`modified_by` AS `modified_by`,`b`.`created_on` AS `created_on`,`b`.`modified_on` AS `modified_on`,`b`.`is_active` AS `is_active`,`b`.`is_processed` AS `is_processed` from (`sql12611281`.`vPositions_denorm_data` `a` join `sql12611281`.`risk_metrics_output_data` `b` on((`a`.`Position_Id` = `b`.`PositionKeyID`)))
SabineGl commented 1 year ago

related source in git:

add union for materialized views

all_views union all_mviews

based on slack-support-discussion:

and git ticket: issue on git

SabineGl commented 1 year ago

regarding to the column based lineage issue: mentioned in here: link to slack support

In case of an oracle view the column based lineage is not shown - you only can see the connected main id:

oracle_missing_column_based_lineage

ulixius9 commented 1 year ago

related to @SabineGl's comment the query looks like this create view new_view (col1,col2) select col1,col2 from table to replicate this we can ingest the oracle source containing views

context: https://openmetadata.slack.com/archives/C02B6955S4S/p1681204033955849

ulixius9 commented 1 year ago

dialect: snowflake support thread: https://openmetadata.slack.com/archives/C02B6955S4S/p1681277509847309

CREATE OR REPLACE TRANSIENT TABLE final_table
copy grants
as
Select a.*, 
coalesce(table_1.column_d, table_2.column_d, table_3.column_d, table_4.column_d) AS column_a, 
coalesce(table_1.type, table_2.type, table_3.type, table_4.type) AS column_b,
table_5.column_b,
CASE WHEN table_5.column_b IS NOT NULL or column_c = 'alpha' THEN 'alpha'
WHEN coalesce(table_1.column_d, table_2.column_d, table_3.column_d, table_4.column_d) IS NOT NULL AND table_5.column_b IS NULL and column_c <> 'alpha' THEN 'beta'
ELSE column_c END AS financial_column_c,
max(CASE WHEN pr.id_1 IS NULL THEN 0 ELSE 1 END)  AS column_e
FROM
(
    select distinct
        table_6.column_c,
        table_6.id_2,
        table_6.id_3,
        table_6.date_1,
        table_6.date_1 AS id_3_date,
        table_6.date_2,
        table_6.column_f,
        table_6.column_g,
        table_6.id_4,
        table_6.column_h,
        table_6.column_j,
        table_6.column_i,
        table_6.column_k,
        table_6.column_l,
        table_6.column_m,
        table_6.column_n,
        table_6.column_o,
        table_6.column_p,
        table_6.column_q,
        table_6.column_r,
        table_6.column_s,
        table_6.column_t,
        table_6.column_u,
        table_6.column_v,
        table_6.column_w,
        table_6.column_x,
        table_6.column_y,
        table_6.id_5,
        table_6.column_z,
        table_6.column_aa,
        table_6.column_ab,
        table_6.column_ac,
        table_6.column_ad,
        table_6.column_ae,
        table_6.column_af,
        table_6.column_ag,
        table_6.column_ah,
        table_6.column_ai,
        table_6.column_aj,
        table_6.column_ak,
        table_6.column_al,
        table_6.column_am,
        table_6.column_an,
        table_6.column_ao,
        table_6.column_ap,
        table_6.column_aq,
        table_6.column_ar,
        table_6.column_as,
        table_6.id_6,
        table_6.id_7,
        table_6.column_at,
        table_6.column_au,
        table_6.column_av,
        table_6.column_aw,
        case
              when lower(table_6.column_g)='gamma' or lower(column_ax) = 'gamma' then 'delta'
              when lower(table_6.column_g) = 'delta' then 'Delta'
              when lower(table_6.column_g) = 'eta' then 'Eta'
              when lower(table_6.column_g) = 'zeta' then 'Zeta'
              when lower(table_6.column_g)='theta' then 'Theta'
              when lower(table_6.column_s) = lower(table_7.column_ay) then 'iota'
              when lower(table_6.column_g)=lower('kappa') then 'kappa'
              else 'iota'
        END as column_g_o1,
        case
             when table_6.column_al = 'lambda' then 'lambda'
            else table_8.column_bj
        end as column_bj,
        case
            when table_6.column_f='mu' then 0
            else table_8.column_ba
        end as column_ba,
        case
            when table_6.id_3 =table_9.column_bb then 'nu'
            else 'xi'
        end as column_bd,
        case
            when table_6.column_f  not in ('omni','pi','mu') then null
            when datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2)) = 0  and table_6.id_3 =table_11.id_3 then 'nu'
            else 'xi'
        end as column_be,
        case
            when table_6.column_f  not in ('omni','pi','mu') then null
            when datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2))= 0 then (case when table_6.id_3 =table_11.id_3 then 0 else -99 end)
            else datediff('month', date_trunc('month',table_11.column_bc),date_trunc('month',table_6.date_2))
        end as month_diff,
​
        (case
            when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' then 'rho'
            when y.column_bg is null then 'column_bh'
            else y.column_bg
        end) column_bg,
​
        (case when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' then 'sigma'
            when y.column_bg is null then 'column_bh'
            when lower(y.column_bg)='tau' then 'Tau'
            when lower(y.column_bg)='upsilon' then 'Upsilon'
            else y.column_bg
        end) column_bg_c1,
​
        (case when table_6.column_bf is null or table_6.column_bf='' or table_6.column_bf=' / ' or y.column_bg is null or y.column_bg in ('') or y.column_bg is null
            or lower(y.column_bg) in ('phi','chi','banner','direct','column_her')
          then 'psi'
        else 'Inpsi'
        end) column_bg_bkt,
        table_6.column_ax,
        table_6.lead_id_5
    FROM
    (
        select
            om.*,
            case when lower(om.column_aj) like '%omega%' then column_ap||' / '||column_aq
                when lower(om.column_aj) not like '%omega%' then (case when column_aq is null or column_aq='' then column_ap else column_aq end )
            end as column_bf
        from table_12 om
​
        left join table_13 oe
        on om.id_2=oe.id_2 and oe.column_c='bravo'
        where oe.id_2 is null
    ) table_6
    left join
    (select  LOWER(CONCAT(COALESCE(aggregator,''),'-',COALESCE(charlie,''),'-',COALESCE(mid,''))) as column_bj,mapping_id_2,(column_bh*column_bi) as column_ba,column_bi
      from table_14 where column_c = 'echo'
        and date(entry_timestamp) > date '2001-04-01'
        UNION ALL
      select column_bj,mapping_id_2,column_ba,column_bi from table_15 where lower(column_c)='echo'
      and date(entry_timestamp) <= date '2001-04-01'
    ) table_8
    on table_6.id_2=table_8.mapping_id_2
​
    --Adding column_bd_id_6----------
    left join
    (
      select id_6, id_3 as column_bb,date_1 date_3,row_number() over (partition by id_6 order by date_1) as rank
      FROM table_12
    ) table_9
      on table_6.id_6=table_9.id_6 and table_9.rank=1
​
    left join
    (
      select id_6, id_3,date_2 as column_bc,row_number() over (partition by id_6 order by date_2) as rank
      FROM table_12
      where column_f  in ('omni','pi','mu')
    ) table_11
    on table_6.id_6=table_11.id_6 and table_11.rank=1 and table_6.column_f  in ('omni','pi','mu')
​
    left join
    (
      select distinct y."network/column_ap_column_aq" as column_ap_column_aq, y."channel/column_ap" as column_bg,
        row_number() over(partition by y."network/column_ap_column_aq" order by y."channel/column_ap") as rnk
      from table_16 y
    ) y on table_6.column_bf=y.column_ap_column_aq and y.rnk = 1
​
    left join
    (SELECT DISTINCT column_s as column_ay
     FROM table_17
    )table_7 ON table_7.column_ay = table_6.column_s

) a
​
left join
(-- base64
    Select TRY_BASE64_DElambdaE_STRING(id_4) as id_1,
    --id_4 as id_1,
      id_3_date as date_4,
      column_am as date_5,
      column_an as date_6
    from table_18
    where column_f in('golf','hotel') and column_g_o1 in ('india', 'juliet') and column_g != 'kilo'
) pr
ON (a.id_4 = pr.id_1 and a.date_1 >= pr.date_6 and a.date_1 <= pr.date_5 )
​
LEFT JOIN (
    SELECT DISTINCT
        column_bk,
        column_d,
        'lima' as type,
        to_date(start_date, 'dd/MM/YYYY') as start_date,
        to_date(end_date, 'dd/MM/YYYY') as end_date,
        rank() over (partition by column_bk order by column_d) as rank
    FROM table_19
    WHERE lower(trim(column_bl)) like '%LIMA%'
) table_2
ON lower(a.column_g) = lower(table_2.column_bk)
and date(a.id_3_date) >= table_2.start_date 
and date(a.id_3_date) <= table_2.end_date 
and table_2.rank = 1
​
LEFT JOIN
(
    SELECT DISTINCT
        column_bk,
        column_d,
        'mike' as type,
        to_date(start_date, 'dd/MM/YYYY') as start_date,
        to_date(end_date, 'dd/MM/YYYY') as end_date,
        rank() over (partition by column_bk order by column_d) as rank
    FROM table_19
    WHERE lower(trim(column_bl)) like '%column_bk%'
) table_3
ON lower(trim(SUBSTRING(a.column_h,position('@',a.column_h) + 1, length(a.column_h)))) = trim(lower(table_3.column_bk)) 
and date(a.id_3_date) >= table_3.start_date 
and date(a.id_3_date) <= table_3.end_date 
and table_3.rank = 1
​
LEFT JOIN
(
    SELECT DISTINCT
        column_bk,
        column_d,
        'november' as type,
        to_date(start_date, 'dd/MM/YYYY') as start_date,
        to_date(end_date, 'dd/MM/YYYY') as end_date,
        rank() over (partition by column_bk order by column_d) as rank
    FROM table_19
    WHERE lower(trim(column_bl)) like '%code%'
) table_4
ON UPPER(a.column_s) = UPPER(table_4.column_bk) 
and date(a.id_3_date) >= table_4.start_date 
and date(a.id_3_date) <= table_4.end_date 
and table_4.rank = 1
​
LEFT JOIN
(
    SELECT DISTINCT
        column_bk,
        column_d,
        'oscar' as type,
        to_date(start_date, 'dd/MM/YYYY') as start_date,
        to_date(end_date, 'dd/MM/YYYY') as end_date,
        rank() over (partition by column_bk order by column_d) as rank
    FROM table_19
    WHERE lower(trim(column_bl)) like '%lead%'
) table_1
ON 
lower(trim(a.lead_id_5::varchar)) = lower(trim(table_1.column_bk)) 
and date(a.id_3_date) >= table_1.start_date 
and date(a.id_3_date) <= table_1.end_date 
and table_1.rank = 1
​
LEFT JOIN
(
    SELECT DISTINCT
        u.column_bm,
        project as column_b,
        to_date(start_date, 'dd-MMMM-yyyy') as start_date,
        rank() over (partition by column_bm order by column_y_till desc, project asc) as rank
    FROM table_20 table_21
    INNER JOIN table_22 u
    ON u.number = cast(table_21.column_bn as varchar)
) table_5
ON a.id_4 = table_5.column_bm and table_5.rank = 1 and date(a.id_3_date) >= table_5.start_date
​
GROUP BY
column_c,id_2,id_3,date_1,id_3_date,date_2,column_f,column_g,id_4,column_h,column_j,column_i,column_k,column_l,column_m,column_n,column_o,column_p,column_q,column_r,column_s,column_t,column_u,column_v,column_w,column_x,column_y,id_5,column_z,column_aa,column_ab,column_ac,column_ad,column_ae,column_af,column_ag,column_ah,column_ai,column_aj,column_ak,column_al,column_am,column_an,column_ao,column_ap,column_aq,column_ar,column_as,id_6,id_7,column_at,column_au,column_g_o1,column_bj,column_ba,column_bd,column_be,month_diff,column_bg,column_bg_c1,column_bg_bkt,column_av,column_aw,
column_ax, lead_id_5, column_a, column_b, table_5.column_b, financial_column_c
samarth-1mg commented 1 year ago

dialect: snowflake

Queries starting with CREATE or REPLACE is not creating lineage when used with COPY GRANTS as mentioned in the above message by Queries starting with MERGE INTO are also not being parsed

raghav-khatoria-5000254 commented 1 year ago

Hey When I was ingesting my Metabase dashboards the ingestion was running successfully but I'm getting these errors: [2023-05-09 01:50:34,558] {metadata.py:206} ERROR - Error creating chart [<MY CARD>]: 'NoneType' object has no attribute 'strip'

[2023-05-10 06:28:16,122] {metadata.py:205} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/dashboard/metabase/metadata.py", line 187, in yield_dashboard_lineage_details yield from self._yield_lineage_from_query( File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/dashboard/metabase/metadata.py", line 227, in _yield_lineage_from_query for table in lineage_parser.source_tables: File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__ value = obj.__dict__[self.func.__name__] = self.func(obj) File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 110, in source_tables return self.retrieve_tables(self.parser.source_tables) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper self._eval() File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in _eval self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt] File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in <listcomp> self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt] File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 62, in analyze self._extract_from_dml(stmt, AnalyzerContext()) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 223, in _extract_from_dml next_handler.handle(sub_token, holder) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/base.py", line 37, in handle self._handle(token, holder) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 61, in _handle self._handle_column(token) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 108, in _handle_column self.columns.append(SqlParseColumn.of(token)) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/models.py", line 83, in of return Column( File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 145, in __init__ self.raw_name = escape_identifier_name(name) File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 9, in escape_identifier_name return name.strip("").strip('"').strip("'") AttributeError: 'NoneType' object has no attribute 'strip' `

samarth-1mg commented 1 year ago

Unable to view upstream lineage for a table made with the query -

CREATE OR REPLACE
TRANSIENT   TABLE
DATA_WAREHOUSE.SCHEMA_1.FINAL_TABLE    COPY
GRANTS  AS
WITH
    TABLE_1  AS  (
        SELECT
            COLUMN_1
        ,   COLUMN_2
        ,   COLUMN_3
        ,   COLUMN_4
        ,   COLUMN_5
        ,   COLUMN_6
        ,   COLUMN_7
        ,   COLUMN_8
        ,   COLUMN_9
        ,   BASE64_ENCODE(COLUMN_10)                  AS  COLUMN_10
        ,   BASE64_ENCODE(COLUMN_11)                    AS  COLUMN_11
        ,   COLUMN_12
        ,   COLUMN_13
        ,   COLUMN_14
        ,   COLUMN_15
        ,   COLUMN_16
        ,   COLUMN_17
        ,   COLUMN_20
        ,   COLUMN_21
        ,   COLUMN_22
        ,   COLUMN_23
        ,   COLUMN_24
        ,   COLUMN_25
        ,   COLUMN_26
        ,   COLUMN_27
        ,   COLUMN_28
        ,   COLUMN_29
        ,   COLUMN_30
        ,   COLUMN_31
        ,   COLUMN_32
        ,   COLUMN_33
        ,   COLUMN_34
        ,   COLUMN_35
        ,   COLUMN_36
        ,   COLUMN_37
        ,   COLUMN_38
        ,   COLUMN_39
        ,   COLUMN_40
        ,   COLUMN_41
        ,   BASE64_ENCODE(COLUMN_42)           AS  COLUMN_42
        ,   COLUMN_43
        ,   COLUMN_44
        ,   COLUMN_45
        ,   COLUMN_46
        ,   COLUMN_47
        ,   COLUMN_48
        ,   COLUMN_49
        ,   COLUMN_50
        ,   COLUMN_51
        ,   CAST(COLUMN_52 AS  VARCHAR)            AS  COLUMN_52
        ,   COLUMN_53
        ,   CAST(COLUMN_54   AS  VARCHAR)    AS  COLUMN_54
        ,   COLUMN_55
        ,   COLUMN_56
        ,   COLUMN_57
        ,   COLUMN_58
        ,   COLUMN_59
        ,   COLUMN_60
        ,   COLUMN_61
        ,   COLUMN_62
        ,   COLUMN_63
        ,   BASE64_ENCODE(COLUMN_64)                     AS  COLUMN_64
        ,   COLUMN_65
        ,   COLUMN_66
        ,   COLUMN_67
        ,   COLUMN_68
        ,   COLUMN_69
        ,   COLUMN_70
        ,   COLUMN_71
        ,   COLUMN_72
        ,   COLUMN_73
        ,   COLUMN_74
        ,   COLUMN_75
        ,   COLUMN_76
        ,   COLUMN_77
        ,   COLUMN_78
        ,   COLUMN_79
        ,   COLUMN_80
        ,   COLUMN_81
        ,   COLUMN_82
        ,   COLUMN_83
        ,   COLUMN_84
        ,   COLUMN_85
        ,   COLUMN_86
        ,   COLUMN_87
        ,   COLUMN_88
        ,   COLUMN_89
        ,   COLUMN_90
        ,   COLUMN_91
        ,   COLUMN_92
        ,   COLUMN_93
        ,   COLUMN_94
        ,   COLUMN_95
        ,   COLUMN_96
        ,   COLUMN_97
        ,   COLUMN_98
        ,   COLUMN_99
        FROM
            (
                SELECT
                    DISTINCT
                    COLUMN_1
                ,   COLUMN_2
                ,   COLUMN_3
                ,   COLUMN_4::
                    TIMESTAMP       AS  COLUMN_4
                ,   COLUMN_5::
                    TIMESTAMP       AS  COLUMN_5
                ,   COLUMN_6::
                    TIMESTAMP               AS  COLUMN_6
                ,   COLUMN_7
                ,   COLUMN_8
                ,   COLUMN_9
                ,   COLUMN_10
                ,   COLUMN_11
                ,   COLUMN_12
                ,   COLUMN_13
                ,   COLUMN_14
                ,   COLUMN_18  AS  COLUMN_15
                ,   COLUMN_19   AS  COLUMN_16
                ,   COLUMN_17
                ,   COLUMN_20         AS  COLUMN_20
                ,   COLUMN_21     AS  COLUMN_21
                ,   COLUMN_22
                ,   COLUMN_23             AS  COLUMN_23
                ,   COLUMN_24
                ,   COLUMN_25
                ,   COLUMN_26
                ,   COLUMN_27
                ,   COLUMN_100         AS  COLUMN_28
                ,   COLUMN_29::
                    TIMESTAMP   AS  COLUMN_29
                ,   COLUMN_30::
                    TIMESTAMP                           AS  COLUMN_30
                ,   COLUMN_31
                ,   COLUMN_32
                ,   COLUMN_33
                ,   COLUMN_34
                ,   COLUMN_35
                ,   COLUMN_36
                ,   COLUMN_37
                ,   COLUMN_38
                ,   COLUMN_39
                ,   COLUMN_40
                ,   COLUMN_41
                ,   COLUMN_42                      AS  COLUMN_42
                ,   COLUMN_43
                ,   COLUMN_44
                ,   COLUMN_45
                ,   COLUMN_46
                ,   COLUMN_47
                ,   COLUMN_48
                ,   COLUMN_49
                ,   COLUMN_50
                ,   NULL                                AS  COLUMN_51
                ,   NULL                                AS  COLUMN_52
                ,   COLUMN_101                          AS  COLUMN_53
                ,   NULL                                AS  COLUMN_54
                ,   COLUMN_102                              AS  COLUMN_55
                ,   COLUMN_103                              AS  COLUMN_56
                ,   COLUMN_104                            AS  COLUMN_57
                ,   NULL                                AS  COLUMN_59
                ,   NULL                                AS  COLUMN_60
                ,   NULL                                AS  COLUMN_61
                ,   COLUMN_62
                ,   CAST(COLUMN_63 AS  BIGINT)             AS  COLUMN_63
                ,   COLUMN_64
                ,   COLUMN_65
                ,   COLUMN_66
                ,   COLUMN_67
                ,   COLUMN_68
                ,   COLUMN_69
                ,   COLUMN_70
                ,   COLUMN_71
                ,   COLUMN_72
                ,   COLUMN_73
                ,   COLUMN_74
                ,   COLUMN_75
                ,   COLUMN_76
                ,   COLUMN_77
                ,   COLUMN_78
                ,   COLUMN_79
                ,   COLUMN_80
                ,   COLUMN_81
                ,   COLUMN_82
                ,   COLUMN_83
                ,   COLUMN_84
                ,   COLUMN_85
                ,   COLUMN_86
                ,   COLUMN_87::
                    TIMESTAMP           AS  COLUMN_87
                ,   COLUMN_88
                ,   COLUMN_89::
                    TIMESTAMP                   AS  COLUMN_89
                ,   COLUMN_92
                ,   COLUMN_93
                ,   COLUMN_58
                ,   COLUMN_90
                ,   COLUMN_91
                ,   COLUMN_94
                ,   CASE
                        WHEN
                            TABLE_2.COLUMN_7 NOT IN  ('alpha', 'beta', 'gamma')
                        THEN
                            NULL
                        WHEN
                            DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_2.COLUMN_6))    =   0
                        AND TABLE_2.COLUMN_3                                                                                                    =   TABLE_3.COLUMN_106
                        THEN
                            'delta'
                        ELSE
                            'eta'
                    END                         AS  COLUMN_95
                ,   TABLE_2.COLUMN_107            AS  COLUMN_96
                ,   TABLE_2.COLUMN_108            AS  COLUMN_97
                ,   TABLE_2.COLUMN_98
                ,   TABLE_2.COLUMN_99
                FROM
                    DATA_WAREHOUSE.SCHEMA_1.TABLE_4    TABLE_2
                LEFT JOIN
                    (
                        SELECT
                            COLUMN_10         AS  COLUMN_109
                        ,   COLUMN_94    AS  COLUMN_110
                        ,   COLUMN_3        AS  COLUMN_106
                        ,   COLUMN_6  AS  COLUMN_105
                        ,   ROW_NUMBER() OVER(
                                PARTITION BY
                                    COLUMN_10
                                ,   COLUMN_94
                                ORDER BY
                                    COLUMN_6
                            )               AS  RANK
                        FROM
                            DATA_WAREHOUSE.SCHEMA_1.TABLE_4    TABLE_2
                        WHERE
                            COLUMN_7 IN  ('alpha', 'epsilon', 'gamma')
                    )   TABLE_3
                ON
                    TABLE_3.COLUMN_109         =   TABLE_2.COLUMN_10
                AND TABLE_3.COLUMN_110    =   TABLE_2.COLUMN_94
                AND TABLE_3.RANK                =   1
                AND TABLE_2.COLUMN_7         IN  ('alpha', 'epsilon', 'gamma')
                LEFT JOIN
                    (
                        SELECT
                            DISTINCT
                            COLUMN_64        AS  COLUMN_111
                        ,   COLUMN_3    AS  COLUMN_112
                        ,   DENSE_RANK() OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_5
                            )           AS  COLUMN_86
                        ,   LAG(COLUMN_5, 1) OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_5
                            )           AS  COLUMN_87
                        FROM
                            (
                                SELECT
                                    DISTINCT
                                    COLUMN_64
                                ,   COLUMN_3
                                ,   MIN(COLUMN_5)  AS  COLUMN_5
                                FROM
                                    DATA_WAREHOUSE.SCHEMA_1.TABLE_4
                                WHERE
                                    COLUMN_7 NOT IN  ('gamma')
                                AND COLUMN_64        IS      NOT NULL
                                GROUP BY
                                    1
                                ,   2
                            )
                    )   TABLE_5
                ON
                    TABLE_2.COLUMN_64                            =   TABLE_5.COLUMN_111
                AND CAST(TABLE_2.COLUMN_3   AS  VARCHAR)    =   CAST(TABLE_5.COLUMN_112    AS  VARCHAR)
                LEFT JOIN
                    (
                        SELECT
                            DISTINCT
                            COLUMN_64        AS  COLUMN_113
                        ,   COLUMN_2    AS  COLUMN_114
                        ,   DENSE_RANK() OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_6
                            )           AS  COLUMN_88
                        ,   LAG(COLUMN_6, 1) OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_6
                            )           AS  COLUMN_89
                        FROM
                            (
                                SELECT
                                    DISTINCT
                                    COLUMN_64
                                ,   COLUMN_2
                                ,   COLUMN_6
                                FROM
                                    DATA_WAREHOUSE.SCHEMA_1.TABLE_4
                                WHERE
                                    COLUMN_7 IN  ('alpha', 'epsilon')
                                AND COLUMN_64        IS  NOT NULL
                            )
                    )   TABLE_6
                ON
                    TABLE_2.COLUMN_64                            =   TABLE_6.COLUMN_113
                AND CAST(TABLE_2.COLUMN_2   AS  VARCHAR)    =   CAST(TABLE_6.COLUMN_114    AS  VARCHAR)
                UNION ALL
                SELECT  DISTINCT
                    COLUMN_1
                ,   COLUMN_2
                ,   COLUMN_3
                ,   COLUMN_4::
                    TIMESTAMP       AS  COLUMN_4
                ,   COLUMN_5::
                    TIMESTAMP       AS  COLUMN_5
                ,   COLUMN_6::
                    TIMESTAMP               AS  COLUMN_6
                ,   LOWER(COLUMN_7)      AS  COLUMN_7
                ,   COLUMN_8
                ,   COLUMN_9
                ,   COLUMN_10
                ,   COLUMN_11
                ,   COLUMN_12
                ,   COLUMN_13
                ,   COLUMN_14
                ,   COLUMN_18  AS  COLUMN_15
                ,   COLUMN_19   AS  COLUMN_16
                ,   COLUMN_17
                ,   COLUMN_20
                ,   COLUMN_21
                ,   NULL                    AS  COLUMN_22
                ,   COLUMN_23
                ,   COLUMN_24
                ,   COLUMN_25
                ,   COLUMN_26
                ,   COLUMN_27
                ,   COLUMN_100         AS  COLUMN_28
                ,   NULL                    AS  COLUMN_29
                ,   COLUMN_30::
                    TIMESTAMP                           AS  COLUMN_30
                ,   COLUMN_31
                ,   COLUMN_32
                ,   COLUMN_33
                ,   COLUMN_34
                ,   COLUMN_35
                ,   COLUMN_36
                ,   NULL                                AS  COLUMN_37
                ,   COLUMN_38                       AS  COLUMN_38
                ,   COLUMN_39
                ,   COLUMN_40
                ,   COLUMN_41
                ,   COLUMN_42
                ,   COLUMN_43
                ,   COLUMN_44
                ,   COLUMN_45
                ,   COLUMN_46
                ,   COLUMN_47
                ,   COLUMN_48
                ,   TRY_CAST(COLUMN_49 AS  DATE)           AS  COLUMN_49
                ,   TRY_CAST(COLUMN_50   AS  DATE)       AS  COLUMN_50
                ,   NULL                                AS  COLUMN_51
                ,   NULL                                AS  COLUMN_52
                ,   COLUMN_101                          AS  COLUMN_53
                ,   NULL                                AS  COLUMN_54
                ,   COLUMN_102                              AS  COLUMN_55
                ,   COLUMN_103                              AS  COLUMN_56
                ,   COLUMN_104                            AS  COLUMN_57
                ,   NULL                                AS  COLUMN_59
                ,   NULL                                AS  COLUMN_60
                ,   NULL                                AS  COLUMN_61
                ,   NULL                                AS  COLUMN_62
                ,   CAST(COLUMN_63 AS  BIGINT)             AS  COLUMN_63
                ,   COLUMN_64
                ,   COLUMN_65
                ,   NULL                                AS  COLUMN_66
                ,   COLUMN_67
                ,   NULL                                AS  COLUMN_68
                ,   NULL                                AS  COLUMN_69
                ,   COLUMN_70
                ,   NULL                                AS  COLUMN_71
                ,   COLUMN_72
                ,   COLUMN_73
                ,   COLUMN_116                            AS  COLUMN_74
                ,   NULL                                AS  COLUMN_75
                ,   COLUMN_76
                ,   COLUMN_77
                ,   COLUMN_78
                ,   COLUMN_79
                ,   NULL                                AS  COLUMN_80
                ,   COLUMN_81
                ,   COLUMN_82
                ,   NULL                                AS  COLUMN_83
                ,   NULL                                AS  COLUMN_84
                ,   NULL                                AS  COLUMN_85
                ,   COLUMN_86
                ,   COLUMN_87::
                    TIMESTAMP           AS  COLUMN_87
                ,   COLUMN_88
                ,   COLUMN_89::
                    TIMESTAMP                   AS  COLUMN_89
                ,   COLUMN_117              AS  COLUMN_92
                ,   COLUMN_118              AS  COLUMN_93
                ,   COLUMN_58
                ,   COLUMN_90
                ,   COLUMN_91
                ,   COLUMN_94
                ,   CASE
                        WHEN
                            TABLE_2.COLUMN_7 NOT IN  ('alpha', 'epsilon', 'gamma')
                        THEN
                            NULL
                        WHEN
                            DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_2.COLUMN_6))    =   0
                        AND TABLE_2.COLUMN_3                                                                                                    =   TABLE_3.COLUMN_106
                        THEN
                            'delta'
                        ELSE
                            'eta'
                    END                         AS  COLUMN_95
                ,   NULL                        AS  COLUMN_107
                ,   NULL                        AS  COLUMN_108
                ,   NULL                        AS  COLUMN_98
                ,   NULL                        AS  COLUMN_99
                FROM    DATA_WAREHOUSE.SCHEMA_1.TABLE_7    TABLE_2
                LEFT JOIN
                    (
                        SELECT
                            COLUMN_10         AS  COLUMN_109
                        ,   COLUMN_94    AS  COLUMN_110
                        ,   COLUMN_3        AS  COLUMN_106
                        ,   COLUMN_6  AS  COLUMN_105
                        ,   ROW_NUMBER() OVER(
                                PARTITION BY
                                    COLUMN_10
                                ,   COLUMN_94
                                ORDER BY
                                    COLUMN_6
                            )               AS  RANK
                        FROM
                            DATA_WAREHOUSE.SCHEMA_1.TABLE_7    TABLE_2
                        WHERE
                            COLUMN_7 IN  ('alpha', 'epsilon', 'gamma')
                    )   TABLE_3
                ON
                    TABLE_3.COLUMN_109         =   TABLE_2.COLUMN_10
                AND TABLE_3.COLUMN_110    =   TABLE_2.COLUMN_94
                AND TABLE_3.RANK                =   1
                AND TABLE_2.COLUMN_7         IN  ('alpha', 'epsilon', 'gamma')
                LEFT JOIN
                    (
                        SELECT
                            DISTINCT
                            COLUMN_64        AS  COLUMN_111
                        ,   COLUMN_3    AS  COLUMN_112
                        ,   DENSE_RANK() OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_5
                            )           AS  COLUMN_86
                        ,   LAG(COLUMN_5, 1) OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_5
                            )           AS  COLUMN_87
                        FROM
                            (
                                SELECT
                                    DISTINCT
                                    COLUMN_64
                                ,   COLUMN_3
                                ,   MIN(COLUMN_5)  AS  COLUMN_5
                                FROM
                                    DATA_WAREHOUSE.SCHEMA_1.TABLE_7
                                WHERE
                                    COLUMN_7 NOT IN  ('gamma')
                                AND COLUMN_64        IS      NOT NULL
                                GROUP BY
                                    1
                                ,   2
                            )
                    )   TABLE_5
                ON
                    TABLE_2.COLUMN_64                            =   TABLE_5.COLUMN_111
                AND CAST(TABLE_2.COLUMN_3   AS  VARCHAR)    =   CAST(TABLE_5.COLUMN_112    AS  VARCHAR)
                LEFT JOIN
                    (
                        SELECT
                            DISTINCT
                            COLUMN_64        AS  COLUMN_113
                        ,   COLUMN_2    AS  COLUMN_114
                        ,   DENSE_RANK() OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_6
                            )           AS  COLUMN_88
                        ,   LAG(COLUMN_6, 1) OVER(
                                PARTITION BY
                                    COLUMN_64
                                ORDER BY
                                    COLUMN_6
                            )           AS  COLUMN_89
                        FROM
                            (
                                SELECT
                                    DISTINCT
                                    COLUMN_64
                                ,   COLUMN_2
                                ,   COLUMN_6
                                FROM
                                    DATA_WAREHOUSE.SCHEMA_1.TABLE_7
                                WHERE
                                    COLUMN_7 IN  ('alpha', 'epsilon')
                                AND COLUMN_64        IS  NOT NULL
                            )
                    )   TABLE_6
                ON
                    TABLE_2.COLUMN_64                            =   TABLE_6.COLUMN_113
                AND CAST(TABLE_2.COLUMN_2   AS  VARCHAR)    =   CAST(TABLE_6.COLUMN_114    AS  VARCHAR)
                UNION ALL
                SELECT  DISTINCT
                    COLUMN_1
                ,   CAST(COLUMN_115 AS  VARCHAR(256))   AS  COLUMN_2
                ,   CAST(COLUMN_3   AS  VARCHAR(256))   AS  COLUMN_3
                ,   COLUMN_119::
                    TIMESTAMP                   AS  COLUMN_4
                ,   COLUMN_120::
                    TIMESTAMP                   AS  COLUMN_5
                ,   COLUMN_121::
                    TIMESTAMP                                               AS  COLUMN_6
                ,   COLUMN_122                                          AS  COLUMN_7
                ,   COLUMN_123                                               AS  COLUMN_8
                ,   COLUMN_124                                      AS  COLUMN_9
                ,   COLUMN_10
                ,   COLUMN_125                                          AS  COLUMN_11
                ,   COLUMN_126                                              AS  COLUMN_12
                ,   COLUMN_126                                              AS  COLUMN_13
                ,   NULL                                                    AS  COLUMN_14
                ,   COLUMN_15
                ,   COLUMN_16
                ,   NULL                                                    AS  COLUMN_17
                ,   COLUMN_20
                ,   (COLUMN_126 -   COLUMN_20 -   COLUMN_15)   AS  COLUMN_21
                ,   NULL                                                    AS  COLUMN_22
                ,   COLUMN_23
                ,   NULL                                                    AS  COLUMN_24
                ,   NULL                                                    AS  COLUMN_25
                ,   COLUMN_127                         AS  COLUMN_26
                ,   NULL                                                    AS  COLUMN_27
                ,   CAST(COLUMN_128  AS  VARCHAR(256))           AS  COLUMN_28
                ,   NULL                                                    AS  COLUMN_29
                ,   NULL                                                    AS  COLUMN_30
                ,   NULL                                                    AS  COLUMN_31
                ,   NULL                                                    AS  COLUMN_32
                ,   NULL                                                    AS  COLUMN_33
                ,   NULL                                                    AS  COLUMN_34
                ,   NULL                                                    AS  COLUMN_35
                ,   NULL                                                    AS  COLUMN_36
                ,   COLUMN_37
                ,   COLUMN_38
                ,   COLUMN_129                                         AS  COLUMN_39
                ,   COLUMN_40
                ,   NULL                                                    AS  COLUMN_41
                ,   COLUMN_130                                      AS  COLUMN_42
                ,   COLUMN_43
                ,   NULL                                                    AS  COLUMN_44
                ,   NULL                                                    AS  COLUMN_45
                ,   NULL                                                    AS  COLUMN_46
                ,   NULL                                                    AS  COLUMN_47
                ,   NULL                                                    AS  COLUMN_48
                ,   NULL                                                    AS  COLUMN_49
                ,   NULL                                                    AS  COLUMN_50
                ,   COLUMN_51
                ,   COLUMN_52
                ,   NULL                                                    AS  COLUMN_53
                ,   COLUMN_131                                       AS  COLUMN_54
                ,   COLUMN_132                                              AS  COLUMN_55
                ,   COLUMN_133                                              AS  COLUMN_56
                ,   COLUMN_134                                            AS  COLUMN_57
                ,   COLUMN_59                                             AS  COLUMN_59
                ,   COLUMN_135                                            AS  COLUMN_60
                ,   COLUMN_61
                ,   NULL                                                    AS  COLUMN_62
                ,   COLUMN_136                                     AS  COLUMN_63
                ,   NULL                                                    AS  COLUMN_64
                ,   COLUMN_137                                             AS  COLUMN_65
                ,   NULL                                                    AS  COLUMN_66
                ,   NULL                                                    AS  COLUMN_67
                ,   NULL                                                    AS  COLUMN_68
                ,   NULL                                                    AS  COLUMN_69
                ,   NULL                                                    AS  COLUMN_70
                ,   NULL                                                    AS  COLUMN_71
                ,   NULL                                                    AS  COLUMN_72
                ,   NULL                                                    AS  COLUMN_73
                ,   COLUMN_138                                               AS  COLUMN_74
                ,   NULL                                                    AS  COLUMN_75
                ,   NULL                                                    AS  COLUMN_76
                ,   NULL                                                    AS  COLUMN_77
                ,   COLUMN_139                                             AS  COLUMN_78
                ,   NULL                                                    AS  COLUMN_79
                ,   NULL                                                    AS  COLUMN_80
                ,   NULL                                                    AS  COLUMN_81
                ,   NULL                                                    AS  COLUMN_82
                ,   NULL                                                    AS  COLUMN_83
                ,   NULL                                                    AS  COLUMN_84
                ,   NULL                                                    AS  COLUMN_85
                ,   NULL                                                    AS  COLUMN_86
                ,   NULL                                                    AS  COLUMN_87
                ,   NULL                                                    AS  COLUMN_88
                ,   NULL                                                    AS  COLUMN_89
                ,   COLUMN_140                                      AS  COLUMN_92
                ,   COLUMN_141                                        AS  COLUMN_93
                ,   COLUMN_58
                ,   COLUMN_90
                ,   COLUMN_91
                ,   COLUMN_94
                ,   CASE
                        WHEN
                            TABLE_8.COLUMN_59  NOT IN  ('zeta', 'nu')
                        THEN
                            NULL
                        WHEN
                            DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_8.COLUMN_121))    =   0
                        AND TABLE_8.COLUMN_3                                                                                                                 =   TABLE_3.COLUMN_106
                        THEN
                            'delta'
                        ELSE
                            'eta'
                    END                                                     AS  COLUMN_95
                ,   NULL                                                    AS  COLUMN_107
                ,   NULL                                                    AS  COLUMN_108
                ,   NULL                                                    AS  COLUMN_98
                ,   NULL                                                    AS  COLUMN_99
                FROM    DATA_WAREHOUSE.SCHEMA_1.TABLE_9   TABLE_8
                LEFT JOIN
                    (
                        SELECT
                            DISTINCT
                            COLUMN_10                     AS  COLUMN_109
                        ,   COLUMN_94                AS  COLUMN_110
                        ,   COLUMN_3                    AS  COLUMN_106
                        ,   COLUMN_121 AS  COLUMN_105
                        ,   ROW_NUMBER() OVER(
                                PARTITION BY
                                    COLUMN_94
                                ,   COLUMN_10
                                ORDER BY
                                    COLUMN_121
                            )                           AS  RANK
                        FROM
                            DATA_WAREHOUSE.SCHEMA_1.TABLE_9
                        WHERE
                            COLUMN_59                 IN  ('zeta', 'nu')
                        AND COLUMN_121 IS  NOT NULL
                    )   TABLE_3
                ON
                    TABLE_8.COLUMN_10                      =   TABLE_3.COLUMN_109
                AND TABLE_3.RANK                        =   1
                AND TABLE_8.COLUMN_59                  IN  ('zeta', 'nu')
                AND TABLE_8.COLUMN_94                 =   TABLE_3.COLUMN_110
                AND TABLE_8.COLUMN_121  IS  NOT NULL
            )   TABLE_1
        UNION ALL
        SELECT
            DISTINCT
            'sigma'                          AS  COLUMN_1
        ,   COLUMN_2
        ,   COLUMN_3
        ,   COLUMN_4::TIMESTAMP                   AS  COLUMN_4
        ,   COLUMN_5::TIMESTAMP                AS  COLUMN_5
        ,   COLUMN_6::TIMESTAMP               AS  COLUMN_6
        ,   COLUMN_7
        ,   COLUMN_8
        ,   COLUMN_9
        ,   COLUMN_10
        ,   NULL                                    AS  COLUMN_11
        ,   CAST(COLUMN_12   AS  DOUBLE)     AS  COLUMN_12
        ,   CAST(COLUMN_13 AS  DOUBLE)         AS  COLUMN_13
        ,   NULL                                    AS  COLUMN_14
        ,   COLUMN_15
        ,   NULL                                    AS  COLUMN_16
        ,   NULL                                    AS  COLUMN_17
        ,   NULL                                    AS  COLUMN_20
        ,   CAST(COLUMN_21    AS  DOUBLE) AS  COLUMN_21
        ,   NULL                                    AS  COLUMN_22
        ,   COLUMN_23
        ,   NULL                                    AS  COLUMN_24
        ,   NULL                                    AS  COLUMN_25
        ,   COLUMN_26
        ,   NULL                                    AS  COLUMN_27
        ,   NULL                                    AS  COLUMN_28
        ,   NULL                                    AS  COLUMN_29
        ,   COLUMN_30::TIMESTAMP                      AS  COLUMN_30
        ,   NULL                                    AS  COLUMN_31
        ,   NULL                                    AS  COLUMN_32
        ,   NULL                                    AS  COLUMN_33
        ,   NULL                                    AS  COLUMN_34
        ,   NULL                                    AS  COLUMN_35
        ,   NULL                                    AS  COLUMN_36
        ,   NULL                                    AS  COLUMN_37
        ,   COLUMN_38
        ,   COLUMN_39
        ,   COLUMN_40
        ,   NULL                                    AS  COLUMN_41
        ,   NULL                                    AS  COLUMN_42
        ,   COLUMN_43
        ,   NULL                                    AS  COLUMN_44
        ,   NULL                                    AS  COLUMN_45
        ,   NULL                                    AS  COLUMN_46
        ,   COLUMN_142                            AS  COLUMN_47
        ,   NULL                                    AS  COLUMN_48
        ,   COLUMN_49
        ,   COLUMN_50
        ,   NULL                                    AS  COLUMN_51
        ,   NULL                                    AS  COLUMN_52
        ,   NULL                                    AS  COLUMN_53
        ,   NULL                                    AS  COLUMN_54
        ,   COLUMN_143                      AS  COLUMN_55
        ,   NULL                                    AS  COLUMN_56
        ,   NULL                                    AS  COLUMN_57
        ,   NULL                                    AS  COLUMN_58
        ,   NULL                                    AS  COLUMN_59
        ,   NULL                                    AS  COLUMN_60
        ,   NULL                                    AS  COLUMN_61
        ,   NULL                                    AS  COLUMN_62
        ,   NULL                                    AS  COLUMN_63
        ,   NULL                                    AS  COLUMN_64
        ,   NULL                                    AS  COLUMN_65
        ,   NULL                                    AS  COLUMN_66
        ,   COLUMN_144              AS  COLUMN_67
        ,   NULL                                    AS  COLUMN_68
        ,   NULL                                    AS  COLUMN_69
        ,   NULL                                    AS  COLUMN_70
        ,   NULL                                    AS  COLUMN_71
        ,   NULL                                    AS  COLUMN_72
        ,   NULL                                    AS  COLUMN_73
        ,   NULL                                    AS  COLUMN_74
        ,   NULL                                    AS  COLUMN_75
        ,   NULL                                    AS  COLUMN_76
        ,   NULL                                    AS  COLUMN_77
        ,   NULL                                    AS  COLUMN_78
        ,   NULL                                    AS  COLUMN_79
        ,   NULL                                    AS  COLUMN_80
        ,   NULL                                    AS  COLUMN_81
        ,   NULL                                    AS  COLUMN_82
        ,   NULL                                    AS  COLUMN_83
        ,   NULL                                    AS  COLUMN_84
        ,   NULL                                    AS  COLUMN_85
        ,   NULL                                    AS  COLUMN_86
        ,   NULL                                    AS  COLUMN_87
        ,   NULL                                    AS  COLUMN_88
        ,   NULL                                    AS  COLUMN_89
        ,   NULL                                    AS  COLUMN_90
        ,   NULL                                    AS  COLUMN_91
        ,   COLUMN_92                       AS  COLUMN_92
        ,   NULL                                    AS  COLUMN_93
        ,   'sigma'                          AS  COLUMN_94
        ,   NULL                                    AS  COLUMN_95
        ,   NULL                                    AS  COLUMN_107
        ,   NULL                                    AS  COLUMN_108
        ,   NULL                                    AS  COLUMN_98
        ,   NULL                                    AS  COLUMN_99
        FROM
            DATA_WAREHOUSE.SCHEMA_1.TABLE_10
    )
SELECT
    TABLE_1.*
,   CASE
        WHEN
            (
                TABLE_1.COLUMN_7  IN  ('alpha', 'epsilon', 'gamma')
            OR  (
                    TABLE_1.COLUMN_59      IN  ('zeta', 'nu')
                AND TABLE_1.COLUMN_6   IS  NOT NULL
                AND TABLE_1.COLUMN_1               =   'TABLE_8'
                AND TABLE_1.COLUMN_74       IS  NOT NULL
                )
            )
        THEN
            (
            CASE
                WHEN
                    DATEDIFF('month', DATE_TRUNC('month', TABLE_3.COLUMN_105), DATE_TRUNC('month', TABLE_1.COLUMN_6)) =   0
                AND TABLE_1.COLUMN_3                                                                                                 =   TABLE_3.COLUMN_106
                THEN
                    'delta'
                ELSE
                    'eta'
            END
        )
        ELSE
            NULL
    END     AS  COLUMN_145
FROM
    TABLE_1
LEFT JOIN
    (
        SELECT
            COLUMN_10         AS  COLUMN_109
        ,   COLUMN_3        AS  COLUMN_106
        ,   COLUMN_6  AS  COLUMN_105
        ,   ROW_NUMBER() OVER(
                PARTITION BY
                    COLUMN_10
                ORDER BY
                    COLUMN_6
            )               AS  RANK
        FROM
            TABLE_1
        WHERE
            (
                COLUMN_7 IN  ('alpha', 'epsilon', 'gamma')
            OR  (
                    COLUMN_59     IN  ('zeta', 'nu')
                AND COLUMN_6  IS  NOT NULL
                AND TABLE_1.COLUMN_1           =   'TABLE_8'
                AND TABLE_1.COLUMN_74   IS  NOT NULL
                )
            )
        AND TABLE_1.COLUMN_1   !=  'sigma'
    )   TABLE_3
ON
    TABLE_3.COLUMN_109 =   TABLE_1.COLUMN_10
AND TABLE_3.RANK        =   1
AND TABLE_1.COLUMN_1           !=  'sigma'
AND (
        TABLE_1.COLUMN_7  IN  ('alpha', 'epsilon', 'gamma')
    OR  (
            TABLE_1.COLUMN_59      IN  ('zeta', 'nu')
        AND TABLE_1.COLUMN_6   IS  NOT NULL
        AND TABLE_1.COLUMN_1               =   'TABLE_8'
        AND TABLE_1.COLUMN_74       IS  NOT NULL
        )
    )
;

I don't know if the query is an issue since I am able to view upstream in https://reata.github.io/sqllineage/

The upstream issue is happening for both Snowflake and Databricks.

harshach commented 1 year ago
-- Table 1: Products
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    price DECIMAL(10, 2)
);

-- Insert sample data into Products table
INSERT INTO products (product_name, price)
VALUES
    ('Product A', 10.99),
    ('Product B', 5.99),
    ('Product C', 8.99);

-- Table 2: Orders
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity INT,
    order_date DATE
);

-- Insert sample data into Orders table
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES
    (1, 1, 2, '2023-05-01'),
    (1, 2, 1, '2023-05-03'),
    (2, 3, 3, '2023-05-05');

-- Table 3: Customers
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

-- Insert sample data into Customers table
INSERT INTO customers (name, email)
VALUES
    ('John Doe', 'john@example.com'),
    ('Jane Smith', 'jane@example.com');

-- Create a view to join the tables
CREATE OR REPLACE VIEW order_details AS
SELECT
    orders.order_id,
    customers.name AS customer_name,
    customers.email,
    products.product_name,
    orders.quantity,
    orders.order_date
FROM
    orders
JOIN
    customers ON customers.customer_id = orders.customer_id
JOIN
    products ON products.product_id = orders.product_id; 
Vinay26k commented 1 year ago

Hi, adding postgres db view lineage failure scenario. Query:

CREATE OR REPLACE VIEW public.vw_film_actor_info
AS
SELECT
    CONCAT(fa.film_id, '-A:', fa.actor_id) AS film_actor_key,
    CONCAT(a.first_name, ' ', a.last_name) AS full_name,
    f.title,
    f.release_year,
    l.name,
    f.rating
FROM public.film_actor fa
INNER JOIN public.actor a
    ON fa.actor_id = a.actor_id
INNER JOIN public.film f
    ON fa.film_id = f.film_id
INNER JOIN language l
    ON f.language_id = l.language_id;

Expected Result:

Actual Result/ Lineage from UI: image

samarth-1mg commented 1 year ago

There is an issue with Snowflake lineage where if there is a SELECT * on a subquery with some joins, the lineage parser fails to pick up the columns from the base tables. The query is added below -

CREATE OR REPLACE TRANSIENT TABLE data_warehouse.schema_1.table_1
COPY GRANTS
AS
select *,
case when column_1='alpha' then column_2
when column_3 in ('95','30') then column_2
when column_3 in ('130','131','132') then 'beta'
when column_3 in ('25','45') then 'gamma'
else coalesce(column_2,column_3) end as column_3_final
,case when column_4<=2 then column_4||' days'
else '9 days+' end as column_4_final
,case when column_5>=0 and column_5 < 6 then '[0,6)'
when column_5>=6 and column_5 < 12 then '[6,12)'
when column_5>=12 and column_5 < 16 then '[12,16)'
when column_5>=16 and column_5 < 20 then '[16,20)'
when column_5>=20 and column_5 < 24 then '[20,24)'
when column_5>=24 and column_5 < 36 then '[24,36)'
when column_5>=36 and column_5 < 48 then '[36,48)'
when column_5>=48 and column_5 < 60 then '[48,60)'
when column_5>=60 and column_5 < 72 then '[60,72)'
when column_5>=72 and column_5 < 96 then '[72,96)'
else '96 hrs+' end as column_5_final
,case when column_14 in ('eta','epsilon') then 'theta' else 'nu' end as source_final

from
(select
t2.column_6
,t2.column_7
,t2.column_8
,dateadd('minute',0,t2.created) as column_9
,t2.column_10
,t2.column_3
,case
            when column_3 = '15' then 'a'
        end as column_2,
        case
            when column_3 = '190' then 'alpha'
        end as column_1
,t2.column_11
,t2.column_12
,v.column_13
,t2.column_14
,parse_json(column_15):field_1::varchar as field_1
,t3.column_16
,datediff('day',column_16,dateadd('minute',330,current_timestamp)::timestamp) as column_4
,datediff('minute',oh.column_17,dateadd('minute',330,current_timestamp)::timestamp)*1.0/60 as column_5
,oh.column_17
,max(t2.updated) over () as last_updated

from data_warehouse.schema_2.table_2 t2
left join (select column_8,column_3 as column_3_oh,min(created) as column_17 from data_warehouse.schema_2.table_3
where column_3 is not null
and column_8 in (select distinct column_8 from data_warehouse.schema_2.table_2
where column_3 not in ('140','141','142','99','40'))

group by 1,2) oh on t2.column_8=oh.column_8 and t2.column_3=oh.column_3_oh
left join (select column_7,min(created) as column_16 from data_warehouse.schema_2.table_2
where column_7 in (select distinct column_7 from data_warehouse.schema_2.table_2
where column_3 not in ('140','141','142','99','40')
)

group by 1
) t3 on t2.column_7=t3.column_7
left join data_warehouse.schema_5.table_6 v on t2.column_11::varchar=v.column_19::varchar

where
 t2.column_3 not in ('140','141','142','99','40')
 );
samarth-1mg commented 1 year ago

Adding to previous queries, this query is even failing to produce lineage in reata/sqllineage :

CREATE OR REPLACE TRANSIENT TABLE data_warehouse.schema_2.table_1
COPY GRANTS
AS

SELECT DISTINCT
    table_2_alias.column_59 as column_1,
    table_2_alias.column_2,
    coalesce(table_2_alias.column_3,table_22_alias.column_3) as column_3,
    coalesce(table_2_alias.column_4, table_22_alias.column_4) as column_4,
    column_5,
    table_7_alias.column_56 as column_6,
    table_6_alias.column_55 as column_7,
    column_8,
    table_9_alias.column_56 as column_9,
    table_8_alias.column_55 as column_10,
    column_11,
    table_11_alias.column_56 as column_12,
    table_10_alias.column_55 as column_13,
    column_14,
    table_13_alias.column_56 as column_15,
    table_12_alias.column_55 as column_16,
    column_17,
    table_15_alias.column_56 as column_18,
    table_14_alias.column_55 as column_19,
    column_20,
    table_17_alias.column_56 as column_21,
    table_16_alias.column_55 as column_22,
    table_2_alias.column_58 as column_23,
    table_2_alias.column_57 as column_24,
    table_22_alias.column_31 as column_25,
    table_22_alias.column_58 as column_26,
    table_22_alias.column_57 as column_27,
    table_19_alias.column_59 as column_28,
    table_19_alias.column_58 as column_29,
    table_18_alias.column_59 as column_30,
    table_18_alias.column_58 as column_31,
    table_18_alias.column_57 as column_32,
    table_3_alias.column_56 as column_33, 
    table_4_alias.column_56 as column_34,
    table_5_alias.column_56 as column_35,
    column_36,
    column_37,
    column_38,
    -- table_21
    column_39,
    column_40,
    column_41,
    column_42,
    column_43,
    column_44
from data_warehouse.schema_1.table_2 table_2_alias
inner join data_warehouse.schema_1.table_22 table_22_alias on table_2_alias.column_2 = table_22_alias.column_59
left join data_warehouse.schema_1.table_3 table_3_alias on table_3_alias.column_59=table_2_alias.column_33_id
left join data_warehouse.schema_1.table_4 table_4_alias on table_4_alias.column_59=table_2_alias.column_34_id
left join data_warehouse.schema_1.table_5 table_5_alias on table_5_alias.column_59=table_2_alias.column_35_id
left join data_warehouse.schema_1.table_6 table_6_alias on table_22_alias.column_59 = table_6_alias.column_2
left join data_warehouse.schema_1.table_7 table_7_alias on table_6_alias.column_5 = table_7_alias.column_59
left join data_warehouse.schema_1.table_8 table_8_alias on table_22_alias.column_59 = table_8_alias.column_2
left join data_warehouse.schema_1.table_9 table_9_alias on table_8_alias.column_8 = table_9_alias.column_59
left join data_warehouse.schema_1.table_10 table_10_alias on table_22_alias.column_59 = table_10_alias.column_2
left join data_warehouse.schema_1.table_11 table_11_alias on table_10_alias.column_11 = table_11_alias.column_59
left join data_warehouse.schema_1.table_12 table_12_alias on table_22_alias.column_59 = table_12_alias.column_2
left join data_warehouse.schema_1.table_13 table_13_alias on table_12_alias.column_14 = table_13_alias.column_59
left join data_warehouse.schema_1.table_14 table_14_alias on table_22_alias.column_59 = table_14_alias.column_2
left join data_warehouse.schema_1.table_15 table_15_alias on table_14_alias.column_17 = table_15_alias.column_59
left join data_warehouse.schema_1.table_16 table_16_alias on table_22_alias.column_59 = table_16_alias.column_2
left join data_warehouse.schema_1.table_17 table_17_alias on table_16_alias.column_20 = table_17_alias.column_59
left join data_warehouse.schema_1.table_18 table_18_alias on table_22_alias.drug_id = table_18_alias.column_59
left join data_warehouse.schema_1.table_19 table_19_alias on table_22_alias.column_28 = table_19_alias.column_59
left join data_warehouse.schema_1.table_20 table_20 on table_20.column_59 = table_2_alias.column_45
left join
    (
            -- Mapping those table_21 where 3 levels of category are available
            select table_22_alias.column_59,column_46 as column_46_alias,column_47,column_48,column_49,column_39,column_40,column_41
            from data_warehouse.schema_1.table_22 table_22_alias
            inner join
                 (select c3.column_59 as column_47,
                     c3.column_50 as column_51,
                     c2.column_59 as column_48,
                     c2.column_50 as column_52,
                     c1.column_59 as column_49,
                     c1.column_50 as column_53,
                     c1.column_56 as column_39,
                    c2.column_56 as column_40,
                     c3.column_56 as column_41
                 from data_warehouse.schema_1.table_21 c1
                 left join data_warehouse.schema_1.table_21 c2 on c2.column_50 = c1.column_59
                 left join data_warehouse.schema_1.table_21 c3 on c3.column_50 = c2.column_59
                 where c3.column_50 is not null
                 ) AS X on table_22_alias.column_46 = x.column_47

            union

            -- Mapping those table_21 where 2 levels of category are available
            select table_22_alias.column_59,column_46 as column_46_alias,null as column_47,column_48,column_49,column_39,column_40,
            null as column_41 
            from data_warehouse.schema_1.table_22 table_22_alias
            inner join
                 (select c2.column_59 as column_48,
                     c2.column_50 as column_52,
                     c1.column_59 as column_49,
                     c1.column_50 as column_53,
                     c1.column_56 as column_39,
                     c2.column_56 as column_40
                 from data_warehouse.schema_1.table_21 c1
                 left join data_warehouse.schema_1.table_21 c2 on c2.column_50 = c1.column_59
                 where c1.column_50 is null
                 ) AS X on table_22_alias.column_46 = x.column_48

            union

            -- Mapping all those table_21 where only 1 level of category is available
            select table_22_alias.column_59,column_46 as column_46_alias,null as column_47,null as column_48,c.column_59 as column_49,c.column_56 as column_39,
            null as column_40,null as column_41
            from data_warehouse.schema_1.table_22 table_22_alias
            inner join data_warehouse.schema_1.table_21 c on table_22_alias.column_46 = c.column_59
            where column_50 is null

    ) as table_23 on table_22_alias.column_59 = table_23.column_59

left join
    (   select
        table_24.path,
        split_part(table_24.path,'.',1) as column_42,
        split_part(table_24.path,'.',2) as column_43,
        split_part(table_24.path,'.',3) as column_44,
        table_25_alias.column_1
        from data_warehouse.schema_1.table_24 table_24
        inner join data_warehouse.schema_1.table_25 table_25_alias 
        on table_24.column_54=table_25_alias.column_54
    ) as ud on ud.column_1 = table_2_alias.column_59
hoangdh5 commented 1 year ago

Hello team, I has this sql code (in oracle) was not supported by lineage from query logs, please review it: UPDATE RAW.TABLKE_A MAS SET MAS.ACCURE_INT_LAST_MONTH = (SELECT NVL(SUM(TRANS_AMOUNT), 0) FROM RAW.TABLE_B D WHERE D.POSTED_DT = '2023-07-11')

ulixius9 commented 1 year ago

dialect: sqlserver

CREATE VIEW [dbo].[vDMPrep]
AS
    SELECT
        pc.EnglishProductCategoryName
        ,Coalesce(p.ModelName, p.EnglishProductName) AS Model
        ,c.CustomerKey
        ,s.SalesTerritoryGroup AS Region
        ,CASE
            WHEN Month(GetDate()) < Month(c.BirthDate)
                THEN DateDiff(yy,c.BirthDate,GetDate()) - 1
            WHEN Month(GetDate()) = Month(c.BirthDate)
            AND Day(GetDate()) < Day(c.BirthDate)
                THEN DateDiff(yy,c.BirthDate,GetDate()) - 1
            ELSE DateDiff(yy,c.BirthDate,GetDate())
        END AS Age
        ,CASE
            WHEN c.YearlyIncome < 40000 THEN 'Low'
            WHEN c.YearlyIncome > 60000 THEN 'High'
            ELSE c.[EnglishEducation] 
            -- ELSE 'Moderate'
        END AS IncomeGroup
        ,d.CalendarYear
        ,d.FiscalYear
        ,d.MonthNumberOfYear AS Month
        ,f.SalesOrderNumber AS OrderNumber
        ,f.SalesOrderLineNumber AS LineNumber
        ,f.OrderQuantity AS Quantity
        ,f.ExtendedAmount AS Amount  
    FROM
        dbo.FactInternetSales f
    INNER JOIN dbo.DimDate d
        ON f.OrderDateKey = d.DateKey
    INNER JOIN dbo.DimProduct p
        ON f.ProductKey = p.ProductKey
    INNER JOIN dbo.DimProductSubcategory psc
        ON p.ProductSubcategoryKey = psc.ProductSubcategoryKey
    INNER JOIN dbo.DimProductCategory pc
        ON psc.ProductCategoryKey = pc.ProductCategoryKey
    INNER JOIN dbo.DimCustomer c
        ON f.CustomerKey = c.CustomerKey
    INNER JOIN dbo.DimGeography g
        ON c.GeographyKey = g.GeographyKey
    INNER JOIN dbo.DimSalesTerritory s
        ON g.SalesTerritoryKey = s.SalesTerritoryKey 
ulixius9 commented 1 year ago

dialect: snowflake

UPDATE RDBAPP_CATALOG_VISIBILITIES_HISTORY_STG HST
            SET
                _END_DATE = NVL(LEAST(HST._END_DATE, SRC._UPDATE_DATE), SRC._UPDATE_DATE),
                _IS_CURRENT = IFF(_ACTION = 'DEL', TRUE, FALSE),
                _IS_HARD_DELETED = IFF(_ACTION = 'UPD', FALSE, TRUE),
                _ROW_UPDATE_DATE = SYSDATE()
            FROM
                RDBAPP_CATALOG_VISIBILITIES_HISTORY_DIFF_TMP SRC
            WHERE
                HST.PK = SRC.PK
                AND HST._IS_CURRENT
                AND NOT _ACTION = 'ADD'
            ;
ulixius9 commented 1 year ago

dialect: postgres

CREATE VIEW views_api AS SELECT max(t.id) AS id,
    t.day,
    t.ab_id,
    t.ab_group_id,
    t.ab_group_name,
    t.ab_is_test,
    t.hw_type,
    false AS feature_hit,
    t.metric,
    t.type,
    sum(t.count) AS value,
    a.name,
    a.description,
    a.t_start,
    a.t_end,
    a.layer_name,
    a.traffic,
    a.username
   FROM (public.views_clicks_relativni t
     LEFT JOIN public.ab_api a ON t.ab_id = a.ab_id)
  WHERE t.metric = 'views'
  GROUP BY GROUPING SETS (t.day, t.ab_id, t.ab_group_id, t.ab_group_name, t.ab_is_test, t.hw_type, false::boolean, t.metric, t.type, a.name, a.description, a.t_start, a.t_end, a.layer_name, a.traffic, a.username),(t.day, t.ab_id, t.ab_group_id, t.ab_group_name, t.ab_is_test, false::boolean, t.metric, t.type, a.name, a.description, a.t_start, a.t_end, a.layer_name, a.traffic, a.username)
pmbrull commented 11 months ago

dialect: singlestore

CREATE  VIEW vw_test AS  
SELECT
    a.someColumn ,
    b.anotherColumn 
FROM a
JOIN b 
    ON [a.id](http://a.id/) = b.fk_id

image

sachintiwari2 commented 11 months ago

Affected module Ingestion Framework

Describe the bug For few of the tables the upstream lineage isn't visible, Wanted to understand how this could happen & what is the fix. Sharing the masked query here

COPY GRANTS
AS
SELECT DISTINCT 
    'MASKED' AS sensitive_info_1,
    'MASKED' AS sensitive_info_2,
    'MASKED' AS sensitive_info_3,
    'MASKED' AS sensitive_info_4,
    'MASKED' AS sensitive_info_5,
    'MASKED' AS sensitive_info_6,
    'MASKED' AS sensitive_info_7,
    'MASKED' AS sensitive_info_8,
    'MASKED' AS sensitive_info_9,
    CAST('MASKED' AS VARCHAR(25)) AS sensitive_info_10,
    'MASKED' AS sensitive_info_11,
    CAST('MASKED' AS VARCHAR(25)) AS sensitive_info_12,
    'MASKED' AS sensitive_info_13,
    'MASKED' AS sensitive_info_14,
    COALESCE(DATEDIFF('second','MASKED','MASKED'),0) AS sensitive_info_15,
    CASE 
        WHEN lower('MASKED') IN ('MASKED', 'MASKED') THEN COALESCE(DATEDIFF('second','MASKED','MASKED') - (DATEDIFF('day',DATE('MASKED'),DATE('MASKED'))*43200),0)
        ELSE COALESCE(DATEDIFF('second','MASKED','MASKED')-(DATEDIFF('day',DATE('MASKED'),DATE('MASKED'))*50400) - 
            (CASE 
                WHEN YEAR('MASKED') = 2020 AND WEEK('MASKED') < WEEK('MASKED') THEN 53 + WEEK('MASKED') - WEEK('MASKED')
                WHEN YEAR('MASKED') > 2020 AND WEEK('MASKED') < WEEK('MASKED') THEN 52 + WEEK('MASKED') - WEEK('MASKED')
                ELSE WEEK('MASKED') - WEEK('MASKED')
            END)*36000,0)
    END AS sensitive_info_16,
    CASE 
        WHEN 'MASKED' = 'MASKED' THEN 'MASKED'
        ELSE 'NA'
    END AS sensitive_info_17,
    'MASKED' AS sensitive_info_18,
    'MASKED' AS sensitive_info_19,
    'MASKED' AS sensitive_info_20,
    'MASKED' AS sensitive_info_21,
    'MASKED' AS sensitive_info_22,
    'MASKED' AS sensitive_info_23,
    SUBSTR('MASKED', 1, 50000) AS sensitive_info_24,
    'MASKED' AS sensitive_info_25,
    'MASKED' AS sensitive_info_26,
    'MASKED' AS sensitive_info_27,
    CASE 
        WHEN 'MASKED' = 'MASKED' AND 'MASKED' IN ('MASKED','MASKED','MASKED','MASKED') THEN 'TRUE'
        -- ... (Other conditions here)
        ELSE 'FALSE'
    END AS sensitive_info_28,
    HOUR('MASKED') AS sensitive_info_29,
    CASE 
        WHEN 'MASKED' NOT IN ('40','99','142') AND 'MASKED' IS NULL THEN 'TRUE'
        ELSE 'FALSE' 
    END AS sensitive_info_30
FROM
(
    SELECT DISTINCT 
        'MASKED' AS sensitive_info_1,
        'MASKED' AS sensitive_info_2,
        'MASKED' AS sensitive_info_3,
        'MASKED' AS sensitive_info_4,
        'MASKED' AS sensitive_info_5,
        'MASKED' AS sensitive_info_6,
        'MASKED' AS sensitive_info_7,
        'MASKED' AS sensitive_info_8,
        'MASKED' AS sensitive_info_9,
        'MASKED' AS sensitive_info_10,
        'MASKED' AS sensitive_info_11,
        'MASKED' AS sensitive_info_12,
        'MASKED' AS sensitive_info_13,
        'MASKED' AS sensitive_info_14,
        'MASKED' AS sensitive_info_15,
        'MASKED' AS sensitive_info_16,
        'MASKED' AS sensitive_info_17,
        'MASKED' AS sensitive_info_18,
        'MASKED' AS sensitive_info_19,
        'MASKED' AS sensitive_info_20,
        'MASKED' AS sensitive_info_21,
        'MASKED' AS sensitive_info_22,
        'MASKED' AS sensitive_info_23,
        'MASKED' AS sensitive_info_24,
        'MASKED' AS sensitive_info_25,
        'MASKED' AS sensitive_info_26,
        'MASKED' AS sensitive_info_27
    FROM (
        SELECT * 
               from data_warehouse.ctrl_map.order_flows_base
 -- ... (masked columns from the inner query)
    ) AS t
) AS t
Screenshot 2023-10-27 at 4 42 34 PM
OnkarVO7 commented 11 months ago

Dialect: MSSQL

create view [dbo].[my_second_dbt_model] as

select *
from "test_db"."dbo"."my_first_dbt_model"
where id = 1
tonsumawang commented 10 months ago

Affected Module: Column-Level Lineage with BigQuery metadata + dbt ingestion

Describe the bug: Raising an issue for a missing upstream Column-Level Lineage, it is not being highlighted/connected when clicking a column from the Lineage graph.

Test Case: company_mapping_tmp.company_id is missing its source column companies.dsa_id, based with this line of code in the dbt script below c.dsa_id AS company_id from the dbt script of company_mapping_tmp below:

WITH fineract_office AS ( SELECT ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) AS rnum , id , ABS(SAFE_CAST(external_id AS INT64)) AS external_id , name , opening_date FROMdata-analytics-254406.fineract_production.m_office ) SELECT DISTINCT to_hex(md5(cast(coalesce(cast(dsa_id as string ), '') || '-' || coalesce(cast(lms as string ), '') || '-' || coalesce(cast(lms_dsa_id as string ), '') as string ))) as company_key, c.dsa_id AS nlo , c.dsa_id AS company_id , lms , lms_dsa_id , fo.id AS fin_office_id , COALESCE(c.name, dc.nlo_name) AS nlo_name , lms_name , TIMESTAMP(DATETIME(c.created_at, 'Asia/Manila')) AS created_date , TIMESTAMP(DATETIME(c.updated_at, 'Asia/Manila')) AS updated_date -- follows the date updated in company service , TIMESTAMP(DATETIME(CURRENT_TIMESTAMP(),'Asia/Manila')) AS dbt_created , TIMESTAMP(DATETIME(CURRENT_TIMESTAMP(),'Asia/Manila')) AS dbt_updated , c.country_code FROM (SELECT ROW_NUMBER() OVER(PARTITION BY dsa_id ORDER BY updated_at DESC) AS rnum , * FROMdata-analytics-254406.uploan_production_mongodb.companies) c LEFT OUTER JOINdata-analytics-254406.uploan_prod.data_company_mappingdc ON dc.nlo = c.dsa_id LEFT OUTER JOIN fineract_office fo ON dc.nlo = CAST(fo.external_id AS INT64) AND fo.rnum = 1 WHERE c.rnum = 1

image

zwbill82 commented 10 months ago

Affected Module: Column-Level Lineage with ClickHouse metadata + dbt ingestion

Describe the bug: Raising an issue for a missing upstream Column-Level Lineage, it is not being parsed lineage from upstream table

case: with inv_data as (select date_id, sku, sales_channel_id, country, efn_flag, amz_field, quantity from dbt.stg_inventory_amz), location as (select location_id, location_group, location, amz_field, sales_channel_id, country from eng.amz_location_map), sku_cost as (select sku, avg_unit_cost from dbt.stg_sku_cost), sku_prod_map as (select sku, product_id from dbt.stg_sku_product_map), prod_cost as (select product_id, avg_unit_cost from dbt.stg_product_cost) select a.date_id as date_id, a.sku as sku, b.location_group as location_group, b.location as location, b.location_id as location_id, d.product_id as product_id, a.quantity as quantity, ifNull(c.avg_unit_cost, e.avg_unit_cost) as avg_cost_usd, efn_flag as efn_flag from inv_data a left join location b on (a.country = b.country and a.amz_field = b.amz_field and a.sales_channel_id = b.sales_channel_id) left join sku_cost c on (a.sku = c.sku) left join sku_prod_map d on (a.sku = d.sku) left join prod_cost e on (d.product_id = e.product_id)

image

debug log: [2023-11-10T13:38:03.342+0000] {parser.py:420} DEBUG - Lineage computed with SqlFluff did not perform as expected for the [clickhouse] query: [create table atlas.dbt.int_inventory_juvo as with fb_inv as (select date_id, sku, location_group, location, quantity from dbt.stg_inventory_juvo), location as (select location_group, location, location_id from dbt.stg_location), sku_cost as (select sku, avg_unit_cost from dbt.stg_sku_cost), sku_prod_map as (select sku, product_id from dbt.stg_sku_product_map), prod_cost as (select product_id, avg_unit_cost from dbt.stg_product_cost) select a.date_id as date_id, a.sku as sku, a.location_group as location_group, a.location as location, b.location_id as location_id, d.product_id as product_id, a.quantity as quantity, ifNull(c.avg_unit_cost, e.avg_unit_cost) as avg_cost_usd, 'No' as efn_flag from fb_inv a left join location b on (a.location_group = b.location_group and a.location = b.location) left join sku_cost c on (a.sku = c.sku) left join sku_prod_map d on (a.sku = d.sku) left join prod_cost e on (d.product_id = e.product_id)]

walidaziz commented 10 months ago

Affected Module: Column-Level Lineage with PostgreSQL 15.4

Describe the bug: While ingesting the following query getting an error as ValueError: too many values to unpack during handling of column level lineage

Here the logs from ingestion

0 matches
0af5b390a027
*** Found local files:
***   * /opt/airflow/logs/dag_id=Translation_lineage_xMW3W7KN/run_id=manual__2023-11-17T13:15:41+00:00/task_id=lineage_task/attempt=1.log
[2023-11-17T13:15:42.644+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [queued]>
[2023-11-17T13:15:42.742+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [queued]>
[2023-11-17T13:15:42.743+0000] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
[2023-11-17T13:15:42.749+0000] {taskinstance.py:1327} INFO - Executing <Task(PythonOperator): lineage_task> on 2023-11-17 13:15:41+00:00
[2023-11-17T13:15:42.752+0000] {standard_task_runner.py:57} INFO - Started process 25767 to run task
[2023-11-17T13:15:42.754+0000] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'Translation_lineage_xMW3W7KN', 'lineage_task', 'manual__2023-11-17T13:15:41+00:00', '--job-id', '34', '--raw', '--subdir', 'DAGS_FOLDER/Translation_lineage_xMW3W7KN.py', '--cfg-path', '/tmp/tmps9onqu2l']
[2023-11-17T13:15:42.755+0000] {standard_task_runner.py:85} INFO - Job 34: Subtask lineage_task
[2023-11-17T13:15:42.795+0000] {task_command.py:410} INFO - Running <TaskInstance: Translation_lineage_xMW3W7KN.lineage_task manual__2023-11-17T13:15:41+00:00 [running]> on host 0af5b390a027
[2023-11-17T13:15:43.019+0000] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='admin' AIRFLOW_CTX_DAG_ID='Translation_lineage_xMW3W7KN' AIRFLOW_CTX_TASK_ID='lineage_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-17T13:15:41+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-17T13:15:41+00:00'
[2023-11-17T13:15:43.023+0000] {server_mixin.py:63} DEBUG - Validating client and server versions
[2023-11-17T13:15:43.042+0000] {ingestion_pipeline_mixin.py:51} DEBUG - Created Pipeline Status for pipeline Translation.Translation_lineage_xMW3W7KN: {'eventType': 'entityUpdated', 'entityType': 'ingestionPipeline', 'entityId': '1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'entityFullyQualifiedName': 'Translation.Translation_lineage_xMW3W7KN', 'previousVersion': 0.4, 'currentVersion': 0.4, 'userName': 'admin', 'timestamp': 1700226943041, 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'pipelineStatus', 'newValue': {'runId': 'c75fa4a6-2b6f-43d6-9e8e-7f9b595bb10a', 'pipelineState': 'running', 'startDate': 1700226943022, 'timestamp': 1700226943022}}], 'fieldsDeleted': [], 'previousVersion': 0.4}, 'entity': {'id': '1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'name': 'Translation_lineage_xMW3W7KN', 'displayName': 'Translation_lineage_xMW3W7KN', 'pipelineType': 'lineage', 'fullyQualifiedName': 'Translation.Translation_lineage_xMW3W7KN', 'sourceConfig': {'config': {'type': 'DatabaseLineage', 'resultLimit': 1000, 'queryLogDuration': 1, 'tableFilterPattern': {'excludes': [], 'includes': []}, 'parsingTimeoutLimit': 300, 'schemaFilterPattern': {'excludes': [], 'includes': []}, 'databaseFilterPattern': {'excludes': [], 'includes': []}}}, 'airflowConfig': {'pausePipeline': False, 'concurrency': 1, 'startDate': 1700092800000, 'pipelineTimezone': 'UTC', 'retries': 0, 'retryDelay': 300, 'pipelineCatchup': False, 'scheduleInterval': '0 0 * * *', 'maxActiveRuns': 1, 'workflowDefaultView': 'tree', 'workflowDefaultViewOrientation': 'LR'}, 'loggerLevel': 'DEBUG', 'deployed': True, 'enabled': True, 'href': 'http://openmetadata-server:8585/api/v1/services/ingestionPipelines/1390d683-b12f-4e5d-82e1-3e4fe4be4485', 'version': 0.4, 'updatedAt': 1700226926350, 'updatedBy': 'admin', 'changeDescription': {'fieldsAdded': [], 'fieldsUpdated': [{'name': 'deployed', 'oldValue': False, 'newValue': True}], 'fieldsDeleted': [], 'previousVersion': 0.3}, 'deleted': False, 'provider': 'user'}}
[2023-11-17T13:15:43.052+0000] {metadata.py:55} DEBUG - Source type:postgres-lineage,<class 'metadata.ingestion.source.database.postgres.lineage.PostgresLineageSource'> configured
[2023-11-17T13:15:43.052+0000] {metadata.py:57} DEBUG - Source type:postgres-lineage,<class 'metadata.ingestion.source.database.postgres.lineage.PostgresLineageSource'>  prepared
[2023-11-17T13:15:43.062+0000] {metadata.py:66} DEBUG - Sink type:metadata-rest, <class 'metadata.ingestion.sink.metadata_rest.MetadataRestSink'> configured
[2023-11-17T13:15:43.767+0000] {sql_lineage.py:386} DEBUG - Running lineage with query: CREATE TABLE translation.entrees
    (
        id public.OBJECT_ID,
        legacy_id public.LEGACY_ID,
        agency_legacy_id public.LEGACY_ID,
        building_legacy_id public.LEGACY_ID,
        address_1 CHARACTER VARYING(40),
        address_2 CHARACTER VARYING(40),
        additional_address_info CHARACTER VARYING(30),
        city CHARACTER VARYING(40),
        zip_code CHARACTER VARYING(15),
        country_code TEXT,
        emitter_bip_1 CHARACTER VARYING(20),
        emitter_bip_2 CHARACTER VARYING(20),
        pinpads CHARACTER VARYING(12),
        pinpads_date TIMESTAMP WITHOUT TIME ZONE,
        pinpads_specific CHARACTER VARYING(30),
        desk_rattachment_supplier_legacy_id public.LEGACY_ID,
        perception_supplier_legacy_id public.LEGACY_ID,
        tax_center_supplier_legacy_id public.LEGACY_ID,
        sub_building_number BIGINT,
        created_at TIMESTAMP WITHOUT TIME ZONE,
        updated_at TIMESTAMP WITHOUT TIME ZONE,
        migration_legacy_id TEXT
    )
[2023-11-17T13:15:44.062+0000] {sql_lineage.py:386} DEBUG - Running lineage with query: COPY (
        SELECT
            (jsonb_build_object(
                'createdAt', to_date(now()::TIMESTAMP),
                'updatedAt', to_date(now()::TIMESTAMP)
                ) || jsonb_strip_nulls(to_jsonb(t)))::TEXT
        FROM
            ft_staging.units t) TO STDOUT
[2023-11-17T13:15:44.063+0000] {parser.py:399} DEBUG - Lineage with SqlFluff failed for the [postgres] query: [None]
[2023-11-17T13:15:44.065+0000] {status.py:65} WARNING - Ingesting lineage failed for service [Translation]: 'NoneType' object has no attribute 'strip'
[2023-11-17T13:15:44.066+0000] {status.py:66} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/sql_lineage.py", line 389, in get_lineage_by_query
ulixius9 commented 10 months ago

dialect: mssql

IF OBJECT_ID(N'REPORTING.[dbo].[AggregateHTSEntrypoint]', N'U') IS NOT NULL
    DROP TABLE REPORTING.[dbo].[AggregateHTSEntrypoint];
SELECT DISTINCT
    MFLCode,
    f.FacilityName,
    County,
    SubCounty,
    p.PartnerName,
    a.AgencyName,
    Gender,
    age.DATIMAgeGroup as AgeGroup,
    EntryPoint,
    year,
    month,
    FORMAT(cast(date as date), 'MMMM') MonthName,
    EOMONTH(d.date) as AsofDate,
    Sum(Tested) Tested,
    Sum(Positive) Positive,
    Sum(Linked) Linked,
    CAST(GETDATE() AS DATE) AS LoadDate
INTO REPORTING.[dbo].[AggregateHTSEntrypoint]
FROM NDWH.dbo.FactHTSClientTests hts
LEFT join NDWH.dbo.DimFacility f on f.FacilityKey = hts.FacilityKey
LEFT JOIN NDWH.dbo.DimAgency a on a.AgencyKey = hts.AgencyKey
LEFT JOIN NDWH.dbo.DimPatient pat on pat.PatientKey = hts.PatientKey
LEFT join NDWH.dbo.DimAgeGroup age on age.AgeGroupKey=hts.AgeGroupKey
LEFT JOIN NDWH.dbo.DimPartner p on p.PartnerKey = hts.PartnerKey
LEFT JOIN NDWH.dbo.FactHTSClientLinkages link on link.PatientKey = hts.PatientKey
LEFT JOIN NDWH.dbo.DimDate d on d.DateKey = hts.DateTestedKey
where TestType in ('Initial test','Initial')
GROUP BY
    MFLCode,
    f.FacilityName,
    County,
    SubCounty,
    p.PartnerName,
    a.AgencyName,
    Gender,
    age.DATIMAgeGroup,
    EntryPoint,
    year,
    month,
    FORMAT(cast(date as date), 'MMMM'),
    EOMONTH(d.date);
ulixius9 commented 9 months ago

dialect: mssql

[2023-12-18T17:06:40.387+0000] {status.py:65} WARNING - Error processing query [DECLARE @edition sysname;
SET @edition = cast(SERVERPROPERTY(N'EDITION') as sysname);
SELECT case when @edition = N'SQL Azure' then 2 else 1 end as 'DatabaseEngineType',
SERVERPROPERTY('EngineEdition') AS DatabaseEngineEdition,
SERVERPROPERTY('ProductVersion') AS ProductVersion,
@@MICROSOFTVERSION AS MicrosoftVersion;
select host_platform from sys.dm_os_host_info
if @edition = N'SQL Azure' 
  select 'TCP' as ConnectionProtocol
else
  exec ('select CONVERT(nvarchar(40),CONNECTIONPROPERTY(''net_transport'')) as ConnectionProtocol')
]: 'NoneType' object has no attribute 'strip'
[2023-12-18T17:06:40.387+0000] {status.py:66} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/processor/query_parser.py", line 98, in _run
    parsed_sql = parse_sql_statement(
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/processor/query_parser.py", line 53, in parse_sql_statement
    if not lineage_parser.involved_tables:
  File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 85, in involved_tables
    set(self.source_tables)
  File "/home/airflow/.local/lib/python3.9/site-packages/cached_property.py", line 36, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/lineage/parser.py", line 110, in source_tables
    return self.retrieve_tables(self.parser.source_tables)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
    self._eval()
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 178, in _eval
    self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/runner.py", line 178, in <listcomp>
    self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 62, in analyze
    self._extract_from_dml(stmt, AnalyzerContext())
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/analyzer.py", line 223, in _extract_from_dml
    next_handler.handle(sub_token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/base.py", line 37, in handle
    self._handle(token, holder)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 59, in _handle
    self._handle_column(token)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/handlers/source.py", line 106, in _handle_column
    self.columns.append(SqlParseColumn.of(token))
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/parser/sqlparse/models.py", line 83, in of
    return Column(
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/core/models.py", line 145, in __init__
    self.raw_name = escape_identifier_name(name)
  File "/home/airflow/.local/lib/python3.9/site-packages/sqllineage/utils/helpers.py", line 9, in escape_identifier_name
    return name.strip("`").strip('"').strip("'")
AttributeError: 'NoneType' object has no attribute 'strip'
ulixius9 commented 8 months ago

dialect: oracle

insert into stats_advisor_filter_obj$ 
          (rule_id, obj#, flags, type) 
          select :rule_id, obj#, :flag_include, :type_expanded 
          from stats_advisor_filter_obj$ 
          where type = :type_priv 
          and (bitand(flags, :flag_orcl_owned) =  0 or :get_orcl_objects = 'T')
          and obj# in 
            (select obj# 
             from stats_advisor_filter_obj$ 
             where rule_id = :rule_id and type = :type_user 
               and bitand(flags, :flag_include) != 0)
insert into  WRH$_IM_SEG_STAT  (dbid, per_pdb, con_dbid, snap_id, instance_number,  ts#, dataobj#, obj#, membytes,  scans, scans_delta, db_block_changes, db_block_changes_delta,  populate_cus, populate_cus_delta, repopulate_cus,  repopulate_cus_delta )  select  :dbid, 0 per_pdb,  :srcdbid  con_dbid,  :snap_id, :instance_number,  tsn_kewrimseg, dataobj_kewrimseg,  obj_kewrimseg, membytes_kewrimseg,  scans_kewrimseg, scans_dl_kewrimseg,  db_block_changes_kewrimseg,  db_block_changes_dl_kewrimseg,  populate_kewrimseg, populate_dl_kewrimseg,  repopulate_kewrimseg,  repopulate_dl_kewrimseg  FROM   X$KEWRIMSEGSTAT   ims  WHERE (1 = 1)
ulixius9 commented 6 months ago

dialect: snowflake

identify column transformation logic emp_value, emp_key

INSERT INTO EDW.INTEGRATION.Employee
SELECT
    emp_id,
    "'emp_name'" as EMP_NAME,
    "'emp_title'" as EMP_TITLE
FROM EDW.STAGING.Employee
PIVOT(
        MAX(emp_value)
        FOR emp_key IN ('emp_name','emp_title')
     )
ORDER BY emp_id;
ulixius9 commented 6 months ago

dialect: redshift

MERGE INTO om_test.table_consolidated11  USING om_test.table_consolidated6 ON om_test.table_consolidated11.job_id = om_test.table_consolidated6.job_id\nWHEN MATCHED THEN UPDATE SET job_id = om_test.table_consolidated6.job_id, code = om_test.table_consolidated6.code,job_status = om_test.table_consolidated6.job_status\nWHEN NOT MATCHED THEN INSERT VALUES (om_test.table_consolidated6.job_id,om_test.table_consolidated6.code,om_test.table_consolidated6.job_status)
ulixius9 commented 6 months ago

dialect: redshift context: https://github.com/open-metadata/OpenMetadata/issues/15427

create materialized view test_schema.sales_current2 as (
    WITH current_or_previous AS (
        SELECT
            MAX(data_release_version) AS data_release_version
        FROM metadata_schema.datamart_run
        WHERE dag_id = 'test_id'
    ) select eventid, listid, salesrow from test_schema.sales where salesrow=(select data_release_version from current_or_previous));
ulixius9 commented 6 months ago

dialect: redshift context: https://github.com/open-metadata/OpenMetadata/issues/15423

insert into test_schema.union_all_test
with initial_sub as (select eventid, listid, salesrow
from test_schema.sales),
union_sub_test as (
select eventid, listid, salesrow
from initial_sub
where salesrow='Yes'
union all
select eventid, listid, salesrow
from initial_sub
where salesrow='No')
select eventid, listid, salesrow from union_sub_test where listid in(500,501,502);
ulixius9 commented 6 months ago

dialect: snowflake

create or replace view VW_TEST(
    COL1,
    COL2
)
WITH ROW ACCESS POLICY POLICYDB.POLICYSCHEMA.POLICYNAME ON (COL2) 
as (
    with

-- Import CTEs
TEST_TABLE as (
    select * from DB.SCHEMA.PARENT_TABLE
)

, final as (
    select
        COL1,
        COL2

    from TEST_TABLE
)

select * from final
  );
harshach commented 6 months ago

[2024-03-20T11:15:37.230+0000] {sql_lineage.py:385} DEBUG - Running lineage with query: CREATE VIEW [dbo].[An_View] AS

SELECT DISTINCT
    ColumnA,
    ValueLiteral1 AS AliasIDType,
    'TypeLiteral1' AS AliasType,
    ColumnB,
    CASE
        WHEN ColumnCondition1 = 'Value1' THEN
            (SELECT SubColumn1 FROM SubTable1 WHERE IndustryGroupCondition = ColumnCondition2)
        WHEN ColumnCondition1 = 'Value2' THEN 'ValueLiteral2'
        ELSE 'ValueLiteral3'
    END AS AliasAssetClass,
    CASE
        WHEN RankCondition = 'Value3' AND CollatCondition = 'Value4' THEN
            (SELECT Description FROM SubTable2 WHERE NameCondition = RankCondition)
        ELSE
            (SELECT Description FROM SubTable2 WHERE NameCondition = CollatCondition)
    END AS AliasAssetRank,
    MainTable.AliasCountryCode
FROM MainTable
WHERE TimestampCondition = (
    SELECT MAX(InnerTimestamp) FROM MainTable AS InnerTable WHERE InnerTable.ColumnMatch = MainTable.ColumnMatch
)
AND MainTable.ColumnA NOT IN (
    SELECT ExclusionColumn FROM ExclusionTable WHERE MarketBookCondition AND CodeCondition <> 'ExclusionValue'
)
AND MainTable.ColumnA NOT IN (
    SELECT InnerColumnA FROM SubTable3 WHERE InnerColumnMatch = MainTable.ColumnMatch AND InnerTimestamp > MainTable.TimestampCondition
)
AND MainTable.ColumnA <> 'SpecificExclusion'

UNION

SELECT DISTINCT
    ColumnA,
    ValueLiteral4 AS AliasIDType,
    'TypeLiteral2' AS AliasType,
    ColumnC,
    CASE
        WHEN RankCondition = 'Value3' AND CollatCondition = 'Value6' THEN
            (SELECT Description FROM SubTable4 WHERE NameCondition = ColumnD)
        ELSE
            (SELECT Description FROM SubTable4 WHERE NameCondition = CollatCondition)
    END AS AliasAssetRank,
FROM SubTable5
WHERE RequestDateCondition = (
    SELECT MAX(InnerRequestDate) FROM SubTable5 AS InnerTable2 WHERE InnerTable2.ColumnMatch = SubTable5.ColumnMatch
)
AND SubTable5.ColumnA NOT IN (
    SELECT ExclusionColumn FROM ExclusionTable WHERE MarketBookCondition2
)

UNION

SELECT DISTINCT
    ExclusionTable.Code AS ColumnA,
    ValueLiteral7 AS AliasIDType,
    'TypeLiteral3' AS AliasType,
    ExclusionTable.BondsName,
    ExclusionTable.IssueAmount,
    ExclusionTable.CurrenciesShortname,
    ExclusionTable.CouponRate,
    Countries.AliasCountryShortName
FROM ExclusionTable
INNER JOIN BondsTable ON BondsCondition
WHERE ExclusionTable.MarketBook = LiteralValue8
AND ExclusionTable.Code <> 'ExclusionValue'
AND ExclusionTable.Code NOT IN (
    SELECT DISTINCT ColumnA FROM MainTable
    UNION
    SELECT DISTINCT ColumnA FROM SubTable5
);
11:15:37.560+0000] {parser.py:439} DEBUG - Lineage with SqlFluff failed for the [tsql] query: [CREATE   view dbo.AllCWDT as
(the same query again)

I also get the following error for some other queries:
Lineage computed with SqlFluff did not perform as expected for the [tsql] query: [CREATE view Vw_table

These errors appear multiple times in the metadata logs. And only the lineage of view tables is present, and the lineage of regular tables is missing.

I've also observed that when a view's query employs a UNION, the lineage mapping (when it works) includes the entire query. However, if the query utilizes a JOIN that involves a subquery, the lineage is only mapped to that subquery. For example, with this query, we have the lineage for all the views that are mentioned in the query

CREATE VIEW [dbo].[vw_ARSV] AS

SELECT
    Column1, 
    Column2, 
    Column3,
    SUM(Column4),
    IIF(SUM(Column5) > 0, SUM(Column5), 0) AS POS,
    IIF(SUM(Column5) < 0, SUM(Column5), 0) AS FX
FROM Table1 AS t1
GROUP BY
    Column1,
    Column2,
    Column3

UNION ALL

SELECT
    Column1_2,
    Column2_2,
    Column3_2,
    SUM(Column4_2),
    IIF(SUM(Column5_2) > 0, SUM(Column5_2), 0) AS CMV,
    IIF(SUM(Column5_2) < 0, SUM(Column5_2), 0) AS CMVN
FROM Table2 AS t2
GROUP BY
    Column1_2,
    Column2_2,
    Column3_2

UNION ALL

SELECT
    Column1_3,
    Column2_3,
    Column3_3,
    SUM(Column4_3),
    IIF(SUM(Column5_3) > 0, SUM(Column5_3), 0) AS ABC,
    IIF(SUM(Column5_3) < 0, SUM(Column5_3), 0) AS CBA
FROM Table3 AS t3
GROUP BY
    Column1_3,
    Column2_3,
    Column3_3;
But for this query, the lineage only shows the table Table3 but not the Table1 table. It should show both. Is that related to the fact that we are using sub-queries?

CREATE VIEW [dbo].[AnFXRiskView] AS

SELECT 
    'S' AS Rubric,
    Column1,
    Column2,
    IIF(Column3 <= 90, 'Up to 3 months', 'More than 3 months') AS TT,
    NULL AS Placeholder1,
    'Long' AS PST,
    Column4,
    NULL AS Placeholder2
FROM [DatabaseSchema].[dbo].[Table1]

UNION ALL

SELECT 
    'T' AS Rubric,
    Column5,
    Column6,
    IIF(Column7 <= 90, 'Up to 3 months', 'More than 3 months') AS TT,
    Column8,
    Column9,
    CalculatedColumn10,
    CalculatedColumn11,
    CalculatedColumn12,
    CalculatedColumn13
FROM [DatabaseSchema].[dbo].[Table2] AS T
LEFT JOIN 
    (SELECT DISTINCT
        Column14,
        Column15,
        Column16,
        Column17,
        Column18,
        Column19,
        Column20,
        Column21,
        Column22
     FROM [DatabaseSchema].[dbo].[Table3]) AS P
ON 
    P.Column14 = T.Column14
    AND P.Column15 = T.Column15
    AND P.Column20 = T.TermType
    AND P.Column16 = T.Column16
    AND P.Column17 = T.Column17;

``` from @manuelateles 
Prajwal214 commented 5 months ago
create view my_example_table.pg_stat_statements as SELECT pg_stat_statements.userid, pg stat statements.dbid, pg_stat_statements.toplevel, pg stat statements.guervid, pg_stat_statements.query. pg_stat_statements plans,
pg_ stat statements total plan time, pg_stat_statements.min plan time, pg. stat_statements.max_plan_time, pg_statstatements.mean plan time, pg stat statements.stddev_plan time, pg_stat_statements.calls,
pg stat statements total exec time, pg_stat statements.min exec_time, pg stat_statements.max exec_time, pg_ stat statements mean exec time, pgstatstatements stddey exec time.
pg_stat_statements.rows,
pg_ stat statements.sharedolks. hit, pg stat_statements,shared_olks_read, pgstat statements shared_oks dirtied, pg stat_statements,shared olks written,
Rg stat statements.local blks hit, pg.stat statements.locaL.olks_read, pg_statstatements.local_blks dirtied,
pg_stat_statements.local_blks written, pg stat_statements.temp. olks read, pg. stat_statements.temp olks.written, pg_stat statements.olk read_time, pg stat statements.olk write time, pg_stat_statements.wal_records, pg. stat statements.wal fpi, pg_stat_statements.wal_bytes
FROM pg. stat statements(true) pg stat statements(userid, bid, toplevel, queryid, query, plans, total_plan_time, min_plan_time, max_plan_time, mean_plan_time, stddey_plan time, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddey_exectime, rows, shared._blks hit, shared blks_read, shared blks dirtied, shared blks written, local biks hit, local blks read, local blks dirtied,
local bikS written, temp biks.read, temp blks written, blk read time, blk write time, wal_records, wal_fpi, wat_bytes);

https://openmetadata.slack.com/archives/C02B6955S4S/p1713875131637769 dialect: Postgres

ulixius9 commented 4 months ago

dialect: oracle

CREATE MATERIALIZED VIEW OPENMETADATA.DemoView AS
SELECT * FROM OPENMETADATA.EMPLOYEES
WHERE SALARY>60000;
ulixius9 commented 4 months ago

dialect:mssql

Create view Script (1).txt

gpby commented 4 months ago

dialect: teradata

replace view SQLJ.JARS
AS
select distinct DBC.JARS.DatabaseId,
       DBC.JARS.JarName,
       DBC.JARS.JarId
from DBC.JARS, DBC.DBASE
where
DBASE.DatabaseNameI = database
and
DBC.JARS.DatabaseId = DBASE.DatabaseId
;

Looks lik REPLACE VIEW doesn't processed correctly. Teradata's REPLACE VIEW is an alternative to MySql CREATE OR REPLACE VIEW

gpby commented 4 months ago

dialect: teradata

CREATE VIEW DBC.ZonesVX
AS
SELECT Zones.ZoneName (NAMED ZoneName),
DB1.DatabaseName (NAMED RootName),
Zones.RootType (NAMED RootType),
DB2.DatabaseName (NAMED ZoneDBAName),
DB3.DatabaseName (NAMED CreatorName),
Zones.CreateTimeStamp (NAMED CreateTimeStamp)
FROM DBC.Zones
  LEFT OUTER JOIN DBC.Dbase DB1
   ON DBC.Zones.ZoneRootID = DB1.DatabaseID
  LEFT OUTER JOIN DBC.Dbase DB2
   ON DBC.Zones.ZoneDBAId = DB2.DatabaseID
  LEFT OUTER JOIN DBC.Dbase DB3
   ON DBC.Zones.CreateUID = DB3.DatabaseID
WHERE DB3.DatabaseId = TD_AUTHID
WITH CHECK OPTION;

Error: Found unparsable section: 'WITH CHECK OPTION'

gpby commented 4 months ago

dialect: teradata

CREATE VIEW dbc.test_v AS
LOCKING TABLE dbc.dbcinfo FOR ACCESS
SELECT * FROM dbc.dbcinfo;

Probably section LOCKING TABLE ... FOR ACCESS is unparsable.

ulixius9 commented 4 months ago

dialect: mssql

view definition holdings denorm.txt

(nolock) is unparsable

also the column level lineage is not detected in OpenMetadata due to []

geoHeil commented 3 months ago

dialect: Postgresql (16)

 +-----------------------------------------+-------------------------------------------------+
| Query                                   | Error                                           |
+=========================================+=================================================+
| create table minidemo.default.sample as | Lineage with SqlFluff failed for the [postgres] |
+-----------------------------------------+-------------------------------------------------+
| create table minidemo.demo.sample as    | Lineage with SqlFluff failed for the [postgres] |
+-----------------------------------------+-------------------------------------------------+
Prajwal214 commented 3 months ago

dialect: Athena Athena Query.txt

https://openmetadata.slack.com/archives/C02B6955S4S/p1718362853748709

Prajwal214 commented 3 months ago

dialect: Postgres

SET client_min_messages=notice

ALTER SEQUENCE public.copro_propositions_id_seq OWNED BY public.copro_propositions.id

ALTER SEQUENCE public.admin_answers_id_seq OWNED BY public.admin_answers.id

SET standard_conforming_strings = on

show timezone

CREATE SEQUENCE public.group_logs_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1

https://openmetadata.slack.com/archives/C02B6955S4S/p1719205908885939

Prajwal214 commented 3 months ago

dialect: Snowflake LineageParsingError.txt

https://openmetadata.slack.com/archives/C02B6955S4S/p1719998015772969