Closed rachellougee closed 9 months ago
Why do we have a Slack channel for alerts, but it's only QA errors?
https://mitodl.slack.com/archives/C056G5XMBL4/p1705385653025719
It has QA and production errors routed to it, it's just that only QA is generating errors from Airbyte. It does not yet alert on the dbt failures because those happen in Dagster. That's the part that I'm working on addressing now.
I tried to clarify the title of this issue. Please fix if I'm wrong.
Description/Context
In dagster pipeline, there is a job
airbyte_asset_sync
that is responsible for the database sync from various sources in airbyte and running dbt models afterward. Currently, there is no notification if/when the job fails - either due to one of source sync or errors in dbt models. Sometimes the job failure would go unnoticed for several days until someone logs in https://pipelines.odl.mit.edu/locations/lakehouse-assets-graph/jobs/airbyte_asset_sync/runs to read the error logs. This affects the freshness of our data models. If the failure is due to the source sync, dbt build would not be triggered at all.We should improve the monitoring and alerting around this job. Mike suggested the slack notification, so maybe we can send a summary to
data-platform-alerts
slack channel with the following details in case of job failure:Plan/Design
TBD