mozilla / docker-etl

Collection of dockerized ETL jobs managed by data engineering.
Mozilla Public License 2.0
19 stars 15 forks source link

[dap collectors] fix incorrect value for timestamp field in results table #241

Closed dmueller closed 4 months ago

dmueller commented 4 months ago

https://mozilla-hub.atlassian.net/browse/AE-457

problem

My testing on the error cases was done without cloud auth so I didn't validate the writes to bigquery succeeded. The code didn't enforce any type constraints so I had passed the wrong kind of value to one of the method parameters.

The docker builds were updated and then the dev job ran but failed due to the error TypeError: Object of type datetime is not JSON serializable.

This run was expected to get an error record written to bigquery and not an airflow error since the airflow schedule hasn't switched to only pass in dates at midnight yet.

solution

I reproduced this locally using the same parameters that the airflow job ran with, and then figured out that the base report expected an int for the slot_start field.

testing

python3 main.py --date=2024-07-16T15:15:00+00:00 --project=moz-fx-ads-nonprod --ad-table-id=ppa_dev.measurements --report-table-id=ppa_dev.reports --task-config-url=https://storage.googleapis.com/ads-nonprod-stage-ppa-dev/tasks.json --ad-config-url=https://storage.googleapis.com/ads-nonprod-stage-ppa-dev/ads.json

ran this before and after the code adjustment. before the change it had the not JSON serializable error, after it succeeded and I confirmed the report result with the error message showed up in bigquery.

Checklist for reviewer: