mozilla / bigquery-etl

Bigquery ETL
https://mozilla.github.io/bigquery-etl
Mozilla Public License 2.0
259 stars 101 forks source link

Compare Standard SQL assert to assert.* UDFs #2472

Open ANich opened 3 years ago

ANich commented 3 years ago

There is an available ASSERT expression (docs).

We also have a few UDFs defined in tests/assert eg. assert.equals that we might be able to replace with the native version.

jklukas commented 3 years ago

ASSERT was added sometime in the past two years, so didn't exist when most of this test framework was developed. It would indeed be interesting to see if we can deprecate some of our testing code by using ASSERT.

ANich commented 1 year ago

Did some investigation for replacing this for UDFs. Here's what we could do without making any changes to our pytest plugins, using normalize_search_engine as an example:

Before:

-- Test
SELECT
  assert.equals('Engine1', udf.normalize_search_engine('engine1')),
  assert.equals('Engine2', udf.normalize_search_engine('Engine2-abc')),
  assert.equals('Other', udf.normalize_search_engine('not-Engine1')),
  assert.equals('Other', udf.normalize_search_engine('engine')),
  assert.null(udf.normalize_search_engine(NULL))

After:

-- Test
ASSERT
  'Engine1' = udf.normalize_search_engine('engine1')
  AND 'Engine2' = udf.normalize_search_engine('Engine2-abc') AS 'Test normalizes search engine';

ASSERT 
  'Other' = udf.normalize_search_engine('not-Engine1')
  AND 'Other' = udf.normalize_search_engine('engine') AS 'Test substitutes "Other" when not matched';

ASSERT udf.normalize_search_engine(NULL) IS NULL AS 'Test NULL';

A few differences:

  1. ASSERT statements don't have 400 Expected "x" but got "y" error messages unfortunately. They use the alias/column name as the error message should the assertion fail. To have this be useful we could use descriptive aliases such as the above.

  2. Another difference is that each ASSERT is a separate collected item, compared to one collected item with SELECT assert.equals(...), assert.equals(...).

  3. We can run ASSERTs in SQL interfaces with relying on the deployed assert function.