mjirv / dbt-datamocktool

A dbt package for unit testing your SQL analytics models
Apache License 2.0
166 stars 20 forks source link

Support for testing Bigquery arrays #68

Open pettersoderlund opened 1 year ago

pettersoderlund commented 1 year ago

Describe the feature

Support for testing columns with data type array in unit tests when working with Bigquery.

Describe alternatives you've considered

In our organisation we have used datamocktool to test arrays by specifying "compare_columns: - TO_JSON_STRING(array_col)". This has been working fine with 0.2.1 of dmt but was no longer working in >=0.2.2.

Additional context

Providing function-wrapped columns to compare columns should be useful in many databases. However I only have experience doing this in BigQuery.

The change of the handling of compare_columns was changed 0.2.2 where we went from a straight passing on of parameter compare columns to dbt_utils.test_equality to the new way of "finding excluded columns" to provide to audit_helper.compare_relations with parameter exclude_columns.

Who will this benefit?

Anyone using arrays or structs in tables wanting to unit test also these columns.

Are you interested in contributing this feature?

I feel that i need some input to be able to get started on this.

mjirv commented 1 year ago

@pettersoderlund just to confirm, this isn't solved by your PR we just merged, is it? I'm trying to understand why this is broken now. Thanks!

pettersoderlund commented 1 year ago

@mjirv no that's correct! - this problem is the same as stated in https://github.com/mjirv/dbt-datamocktool/issues/66#issuecomment-1486932968

The PR just merged just fixes the bug with compare columns.

pettersoderlund commented 1 year ago

nooo sorry -- this is not finished. Sorry if I was unclear. This feature is not fixed with the last PR

mjirv commented 1 year ago

ah ok I misread you 😅

millyleadley commented 1 year ago

Just had a quick look at this. I haven't context loaded about how this used to work and what's changed, but if we were happy with it being a good solution for array columns we might be able to retrace our steps in getting it working again. We should also call out this as being a workaround in the readme.

The other option I can think of would be to create a new parameter like actual_output_field_mapping which accepts a field mapped to a transformation eg.

models:
  - name: stg_customers
    tests:
      - dbt_datamocktool.unit_test:
          input_mapping:
            source('jaffle_shop', 'raw_customers'): ref('dmt__raw_customers_1')
          expected_output: ref('dmt__expected_stg_customers_1')
          actual_output_field_mapping:
            my_array_column: TO_JSON_STRING(my_array_column)
          depends_on:
            - ref('raw_customers')

And we'd apply that transformation when selecting fields in the actual output, just before we compare models. Have no idea how to do that elegantly though 😂

I figured we couldn't pass a parameter like array_columns=[...] because the function to transform the column to a string will be different in each data warehouse unfortunately