mjirv / dbt-datamocktool

A dbt package for unit testing your SQL analytics models
Apache License 2.0
166 stars 20 forks source link

Package creating a new table instead of using existing seed #88

Open Elsayed91 opened 1 year ago

Elsayed91 commented 1 year ago

Describe the bug

I was working on a personal project 5 months ago which had dbt, and this package, everything was working well. Now as I was trying to run the project again, it is not going as well.

Steps to reproduce

      - dbt_datamocktool.unit_test:
          input_mapping:
            source('historical_data', 'yellow_historical'): ref('test_dbt__bi__yellow_m2m')
          expected_output: ref('dbt__bi__yellow_m2m_expected')
          depends_on:
            - ref('seed_zones')
          # tags: ["unit-test"]

test_dbt__bi__yellow_m2m and dbt__bi__yellow_m2m_expected being seeds that have same configuration and are in the same directory and are also loaded to the same bigquery dataset. the 2 seeds originate from 2 csv files with the same names.

Expected results

A big query job combining the 2 seeds and testing the results of apply a model to one of the input and comparing it to the output.

Actual results

the tool uses only 1 of the two table, the expected_output table, and attempts to create the other table.

/* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "default", "target_name": "dev", "node_id": "test.nytaxi.dbt_datamocktool_unit_test_dbt__bi__yellow_m2m_ref_seed_zones___ref_dbt__bi__yellow_m2m_expected___ref_test_dbt__bi__yellow_m2m_.ac26a1c36a"} */

with a as (

select

    `vendors` 

      , 

    `year` 

      , 

    `month` 

      , 

    `monthly_fare_amount` 

      , 

    `average_passenger_count` 

      , 

    `average_trip_distance` 

      , 

    `monthly_extra_amount` 

      , 

    `monthly_mta_tax` 

      , 

    `monthly_tip_amount` 

      , 

    `monthly_tolls_amount` 

      , 

    `monthly_improvement_surcharge` 

      , 

    `monthly_total_amount` 

      , 

    `monthly_congestion_surcharge` 

      , 

    `pickup_borough` 

      , 

    `pickup_zone` 

      , 

    `dropoff_borough` 

      , 

    `dropoff_zone` 

      , 

    `avg_duration_minutes` 

      , 

    `trip_count` 

      , 

    `total_duration_hours` 

      , 

    `busiest_hour_of_day` 

      , 

    `airport_trip` 

      , 

    `payment_type` 

from `stellarismusv5`.`models_ut`.`dbt__bi__yellow_m2m_expected`

),

b as (

select

    `vendors` 

      , 

    `year` 

      , 

    `month` 

      , 

    `monthly_fare_amount` 

      , 

    `average_passenger_count` 

      , 

    `average_trip_distance` 

      , 

    `monthly_extra_amount` 

      , 

    `monthly_mta_tax` 

      , 

    `monthly_tip_amount` 

      , 

    `monthly_tolls_amount` 

      , 

    `monthly_improvement_surcharge` 

      , 

    `monthly_total_amount` 

      , 

    `monthly_congestion_surcharge` 

      , 

    `pickup_borough` 

      , 

    `pickup_zone` 

      , 

    `dropoff_borough` 

      , 

    `dropoff_zone` 

      , 

    `avg_duration_minutes` 

      , 

    `trip_count` 

      , 

    `total_duration_hours` 

      , 

    `busiest_hour_of_day` 

      , 

    `airport_trip` 

      , 

    `payment_type` 

from -- depends_on: `stellarismusv5`.`models_seeds`.`seed_zones`

    `stellarismusv5`.`models_historical`.`dbt__bi__yellow_m2m_dmt_49567973`

),

a_intersect_b as (

    select * from a

    intersect distinct

    select * from b

),

a_except_b as (

    select * from a

    except distinct

    select * from b

),

b_except_a as (

    select * from b

    except distinct

    select * from a

),

all_records as (

    select
        *,
        true as in_a,
        true as in_b
    from a_intersect_b

    union all

    select
        *,
        true as in_a,
        false as in_b
    from a_except_b

    union all

    select
        *,
        false as in_a,
        true as in_b
    from b_except_a

),

final as (

    select * from all_records
    where not (in_a and in_b)
    order by  in_a desc, in_b desc

)

select * from final

dbt__bi__yellow_m2m_dmt_49567973 is not a table that I have created. It is basically an identical copy of the table that is being tested, and it is even created in the same dataset as the original table.

while stellarismusv5.models_ut.dbt__bi__yellow_m2m_expected is the correct seed for expected output.

Screenshots and log output

image image

System information

packages:
  - package: calogica/dbt_date
    version: 0.7.2
  - package: calogica/dbt_expectations
    version: 0.8.2
  - package: elementary-data/elementary
    version: 0.8.2
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: mjirv/dbt_datamocktool
    version: 0.2.0

Which database are you using dbt with?

The output of dbt --version:

1.5.2

Additional context

I checked issues first, and it does not look like this is a common issue, so I'd wager that this is something incorrect on my end. I have spent the better part of the day fiddling around, trying things out, changing directories, names, nothing works. I cannot seem to pinpoint the cause, so any hints would be appreciated.