reata / sqllineage

SQL Lineage Analysis Tool powered by Python
MIT License
1.35k stars 241 forks source link

The result returns less than original fields when using version 1.5.3 #659

Open 245Bluesky opened 2 weeks ago

245Bluesky commented 2 weeks ago

Describe the bug The result returns less than original fields when using version 1.5.3

SQL Paste the SQL text here. For example:

INSERT INTO table_e

    SELECT
         t1.product
       , tag_name
       , tag_description
       , tag_catagory_1
       , tag_catagory_2
       , tag_catagory_3
       , vesion_code
       , lang
       , region
       , feedback_channel
       , tag_count
       , dau
       , tag_count * 1000000.0 / dau AS million_user_feedback_rate
       , vesion_name
       , os_version
       , brand
       , t4.version_dau AS version_dau
       , t1.dt
    FROM (
        SELECT 
             f1.product
           , f1.dt
           , tag_name AS tag_name
           , vesion_code
           , vesion_name
           , lang
           , region
           , feedback_channel
           , tag_description
           , tag_catagory_1
           , tag_catagory_2
           , tag_catagory_3
           , os_version
           , brand
           , SUM(tag_count) AS tag_count
        FROM (
            SELECT
                dt
              , product
              , ticket_id
              , vesion_code
              , vesion_name
              , lang
              , region
              , feedback_channel
              , os_version
              , brand
            FROM table_a
            WHERE dt = '2024-11-08'
        ) f1

        LEFT JOIN (
            SELECT 
                 dt
               , product
               , ticket_id
               , tag_name
               , tag_description
               , tag_catagory_1
               , tag_catagory_2
               , tag_catagory_3
               , COUNT(DISTINCT ticket_id) AS tag_count
            FROM table_a
            WHERE dt = '2024-11-08'
            GROUP BY
                 dt
               , product
               , ticket_id
               , tag_name
               , tag_description
               , tag_catagory_1
               , tag_catagory_2
               , tag_catagory_3
        ) f2
        ON f1.dt = f2.dt AND f1.product = f2.product AND f1.ticket_id = f2.ticket_id

        GROUP BY
            1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
    ) t1

    LEFT JOIN (
        SELECT
            dt
          , 'xx' AS product
          , SUM(dau) AS dau
        FROM table_c
        WHERE dt = '2024-11-08'
        GROUP BY dt
    ) t3
    ON t1.dt = t3.dt AND t1.product = t3.product

    LEFT JOIN (
      SELECT
          dt
        , version_code
        , 'xx' AS product
        , COUNT(*) AS version_dau
      FROM table_d
      WHERE dt = '2024-11-08'
      GROUP BY
          dt
        , version_code
    ) t4
    ON t1.dt = t4.dt AND t1.product = t4.product AND CAST(t1.vesion_code AS BIGINT) = t4.version_code  

To Reproduce

sqllineage -e test_sql -l column

final result (only returns 5 fileds ) table_e.dau <- t3.dau <- table_c.dau table_e.dt <- t1.dt <- f1.dt <- table_a.dt table_e.million_user_feedback_rate <- t3.dau <- table_c.dau table_e.product <- t1.product <- f1.product <- table_a.product table_e.version_dau <- t4.version_dau <- table_d.*

except result we need to returns 18 fields in SELECT SQL

Python version (available via python --version)

SQLLineage version (available via sqllineage --version):