nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
35 stars 20 forks source link

Send error logs to Slack #408

Open joverlee521 opened 1 year ago

joverlee521 commented 1 year ago

Context

Requested by @corneliusroemer as a a convenient way to inspect the build logs when the build fails on AWS batch.

Possible solution

According to the Snakemake docs, the onerror handler has access to the log variable which points to the Snakemake log file. This log file only contains the high level info on which rule failed but does not include the stdout/stderr of the scripts of the rule. If the rule has a log file, then using the command line option --show-failed-logs will include the rule's log file in the Snakemake log file. This will require us to add a log file for every rule in the workflow to ensure that we capture any potential error outputs.

Another option is to do this outside of Snakemake. We could send the logs from the nextstrain build command to Slack, but this would require the GitHub Action workflow to stay attached to the AWS Batch job (as discussed recently in a separate PR) or we do some additional wiring to trigger another GH Action workflow in the onerror handler that would re-attach to the failed job and send logs.

Finally, I wanted to surface the recent discussions of implementing a monitoring system/dashboard that would probably minimize the need for sending logs to Slack.