pangeo-forge / dataflow-status-monitoring

GCP Resources to report the status of completed Dataflow jobs.
3 stars 0 forks source link

Remove trailing $ from failure regex #5

Closed cisaacstern closed 2 years ago

cisaacstern commented 2 years ago

As shown in https://github.com/pangeo-forge/staged-recipes/pull/138#issuecomment-1170382523, success webhooks work when deployed from main. However, I never got a failure webhook following this test deployment, despite the fact that the associated Dataflow job did fail

Screen Shot 2022-06-29 at 1 45 57 PM

Digging around on the GCP Console, it appears that the pub/sub push from the log sink never occurred. Looking at the Logs Explorer, we do have an ERROR line starting with the string Workflow failed.

Screen Shot 2022-06-29 at 1 47 55 PM

As anticipated by the filter in our log sink:

https://github.com/pangeo-forge/dataflow-status-monitoring/blob/5daa3b6eff8ec2090738825234341ecb83440bbb/terraform/log_sink.tf#L4

With the exact filter applied to these ERROR results, the Workflow failed. line is not captured

Screen Shot 2022-06-29 at 1 51 05 PM

This filter is copied exactly from the source blog post https://cloud.google.com/community/tutorials/dataflow-notification-slack but perhaps that post has gone slightly stale. 🤷

Now, with the trialing $ dropped from the filter, that log line is captured

Screen Shot 2022-06-29 at 1 52 42 PM
cisaacstern commented 2 years ago

I'm going to update the infrastructure from this branch, and see if the failure case is fixed.

cisaacstern commented 2 years ago

https://github.com/pangeo-forge/staged-recipes/pull/138#issuecomment-1170497325 show this working in the wild. Going to merge 🚀