mozilla / bigquery-etl

Bigquery ETL
https://mozilla.github.io/bigquery-etl
Mozilla Public License 2.0
254 stars 101 forks source link

[Data Checks Bug]: ETL check checks__fail_firefox_ios_derived__new_profile_activation__v2 failiing to execute due to incorrect check path #4524

Open kik-kik opened 11 months ago

kik-kik commented 11 months ago

What is the issue?

What was observed, any logs or relevant links?

ETL task (checks__fail_firefox_ios_derivednew_profile_activationv2) responsible for validating that the firefox_ios_derived.new_profile_activation_v2 dataset is correctly being updated currently fails silently with the following error:

[2023-11-06, 13:48:00 UTC] {pod_manager.py:235} INFO - No checks.sql file found in sql//None/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2

Example failed run: https://workflow.telemetry.mozilla.org/dags/bqetl_firefox_ios/grid?root=&dag_run_id=scheduled__2023-10-29T04%3A00%3A00%2B00%3A00&tab=logs&task_id=checks__fail_firefox_ios_derived__new_profile_activation__v2

This is unintended behavior, and only got caught by chance when manually attempting to debug an issue with a downstream dataset. This resulted in the task being marked as successful without the dataset checks being executed.

How can we reproduce the bug

Re-running the task produces the same result, please keep in mind that the process exits with status code `0` therefore the task is marked as success in Airflow (hence why this is a silent failure).

It appears other ETL checks in the same DAG are running and detecting issues (as intended), it is unclear what's causing this specific task to behave this way.

┆Issue is synchronized with this Jira Task

kik-kik commented 11 months ago

Failed to reproduce this locally. Tried running the following command:

./script/bqetl check run sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-30

And correctly seeing the following error:

Error in query string: Error processing job 'moz-fx-data-shared- prod:bqjob_r42bead83008a3193_0000018ba50ab3e2_1': Less than 0 rows found (expected more than 1) 

(currently, this is expected as indeed the dataset is missing data for this specific submission_date)

This appears to mimic what should be happening in the prod deployment: https://github.com/mozilla/telemetry-airflow/blob/0e83b178dc105a897487b911898f5c94ff67c329/utils/gcp.py#L389-L397

kik-kik commented 11 months ago
./script/bqetl check run --marker fail sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-30

Yielded:

/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
Error in query string: Error processing job 'moz-fx-data-shared- prod:bqjob_r354728dfd7c4e974_0000018ba514808b_1': Less than 0 rows found (expected more than 1) 
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
./script/bqetl check run --marker fail sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2 --parameter=submission_date:DATE:2023-10-29

Yielded:

/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
/Users/kik/.apps/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
No checks.sql file found in sql//None/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2

It appears the same command for different submission_dates resulted in different outcomes for the command. For submission dates:

kik-kik commented 11 months ago

It appears the opposite is true for the submission dates when running the checks__fail_firefox_ios_derived__new_profile_activation__v2 task via Airflow. For 2023-10-30 we see no checks.sql message and for 2023-10-29 we see a check running. Tried to re-run the task for execution_date 2023-10-29 and we see a BQ job running: Waiting on bqjob_r253c142020408433_0000018ba5218b78_1 ... (2s) Current status: DONE along with the no checks.sql message.

When investigating the BQ job id to see further details of the query executed, it appears the query used is completely off from what it is expected to be:

image

This is what the bq console shows as executed SQL:

#fail
WITH non_unique AS (
  SELECT
    COUNT(*) AS total_count
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  GROUP BY
    client_id
  HAVING
    total_count > 1
)
SELECT
  IF(
    (SELECT COUNT(*) FROM non_unique) > 0,
    ERROR(
      "Duplicates detected (Expected combined set of values for columns ['client_id'] to be unique.)"
    ),
    NULL
  );

This is the template we expect to be used: https://github.com/mozilla/bigquery-etl/blob/a30788cdd53d0236dcff139987f2e29db15bde6a/sql/moz-fx-data-shared-prod/firefox_ios_derived/new_profile_activation_v2/checks.sql

Which should yield the following query:

#fail
WITH non_unique AS (
  SELECT
    COUNT(*) AS total_count
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  GROUP BY
    client_id
  HAVING
    total_count > 1
)
SELECT
  IF(
    (SELECT COUNT(*) FROM non_unique) > 0,
    ERROR(
      "Duplicates detected (Expected combined set of values for columns ['client_id'] to be unique.)"
    ),
    NULL
  );

#fail
WITH min_row_count AS (
  SELECT
    COUNT(*) AS total_rows
  FROM
    `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
  WHERE
    `date` = @submission_date
)
SELECT
  IF(
    (SELECT COUNTIF(total_rows < 1) FROM min_row_count) > 0,
    ERROR(
      CONCAT(
        "Less than ",
        (SELECT total_rows FROM min_row_count),
        " rows found (expected more than 1)"
      )
    ),
    NULL
  );

#fail
SELECT
  IF(
    COUNTIF(is_new_profile) <> COUNT(*),
    ERROR("Number of is_new_profile TRUE values should be the same as the row count."),
    NULL
  )
FROM
  `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
WHERE
  `date` = @submission_date;

#fail
SELECT
  IF(
    DATE_DIFF(`date`, first_seen_date, DAY) <> 6,
    ERROR("Day difference between values inside `date` and submission_date fields should be 6."),
    NULL
  )
FROM
  `moz-fx-data-shared-prod.firefox_ios_derived.new_profile_activation_v2`
WHERE
  `date` = @submission_date;