wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Add use of monitoring script to cromwell runs #63

Open malachig opened 2 years ago

malachig commented 2 years ago

The Google Cromwell support allows for use of a "monitoring_script"

https://cromwell.readthedocs.io/en/stable/backends/Google/

{
  "monitoring_script": "gs://cromwell/monitoring/script.sh"
}

"The output of this script will be written to a monitoring.log file that will be available in the call gcs bucket when the call completes. "

This sounds very helpful. But what would such a monitoring script look like? I have not been able to find any example of someone using this functionality.

tmooney commented 2 years ago

https://github.com/broadinstitute/cromwell-task-monitor-bq looks to be a drop-in monitor for workflows that pushes the data to BigQuery. (In the README they estimate this monitor will add ~2% to the cost of a run.) Digging a bit more, it looks like the internals of this could possibly also be used as the monitoring_script to skip the BigQuery part.

In typical Cromwell fashion it "assumes certain defaults for naming of dataset/tables, and measurement/reporting intervals" so it ought to be fairly straightforward to plug in as long as we want to use it the same way they do!

tmooney commented 2 years ago

I'll have to take that "skip the BigQuery part" back. I went to try it out on a tiny workflow this morning; even though there is a separate monitor directory it has BQ stuff baked into it, too.

Layth17 commented 1 year ago

https://github.com/griffithlab/cloud-workflows/issues/25