wustl-oncology / cloud-workflows

Infrastructure and tooling required to get genomic workflows running in the cloud
1 stars 6 forks source link

Set up resource monitoring for tasks of cromwell runs #25

Open malachig opened 2 years ago

malachig commented 2 years ago

The Cromwell docs describe the capability to have monitoring for every step of your workflow. The docs I have been able to find are limited:

https://cromwell.readthedocs.io/en/stable/wf_options/Google/ Which states:

Specifies a GCS URL to a script that will be invoked prior to the user command being run. For example, if the value for monitoring_script is "gs://bucket/script.sh", it will be invoked as ./script.sh > monitoring.log &. The value monitoring.log file will be automatically de-localized.

https://cromwell.readthedocs.io/en/latest/backends/Google/ Which states:

In order to monitor metrics (CPU, Memory, Disk usage...) about the VM during Call Runtime, a workflow option can be used to specify the path to a script that will run in the background and write its output to a log file.

{
  "monitoring_script": "gs://cromwell/monitoring/script.sh"
}

The output of this script will be written to a monitoring.log file that will be available in the call gcs bucket when the call completes. This feature is meant to run a script in the background during long-running processes. It's possible that if the task is very short that the log file does not flush before de-localization happens and you will end up with a zero byte file.

malachig commented 2 years ago

In order to test this idea in its simplest form I created an example monitor script and tested it on an active google instance that was running a compute intensive step. https://github.com/griffithlab/cloud-workflows/blob/main/scripts/monitor.sh

I manually logged into the GCP instance using the Google console to test it.

To test on a cromwell run I am attempting the following:

  1. I placed this script in our public google bucket: gs://griffith-lab-workflow-inputs/scripts/monitor.sh

  2. I started a cromwell VM and edited the workflow options config file on this system: sudo vim /shared/cromwell/workflow_options.json. I added the following block to that (at the top level, not nested in another block):

    "monitoring_script": "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"

    According to the Cromwell docs, if you modify this conf file you do NOT need to restart Cromwell. These settings should take effect with the next workflow you run.
    https://cromwell.readthedocs.io/en/stable/wf_options/Overview/

However, if you DID need to restart Cromwell, based on the startup script (https://github.com/griffithlab/cloud-workflows/blob/main/manual-workflows/server_startup.py) I think you could do: sudo systemctl start cromwell

malachig commented 2 years ago

If the my testing works as expected and we want to add this so it happens automatically, then I think it would be added here: https://github.com/griffithlab/cloud-workflows/blob/3822d66e6a0423ade093f48f9c2535b07adfbb6a/manual-workflows/resources.sh#L135-L143

malachig commented 2 years ago

In my first test I looked in a gcs_localization.sh script for an individual task and I now see this:

# Localize singleton file 'gs://griffith-lab-workflow-inputs/scripts/monitor.sh' to '/cromwell_root/monitoring.sh'.
singleton_file_to_localize_573998f91cb96365bcb9696ac6baf714=(
  "griffith-lab"
  "3"
  "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"
  "/cromwell_root/monitoring.sh"
)

localize_singleton_file "${singleton_file_to_localize_573998f91cb96365bcb9696ac6baf714[@]}"
malachig commented 2 years ago

And I see output like this (saved in the bucket as: monitoring.log) in a step that completed very quickly:

Seconds Memory_Percent  Memory_Percent_Peak Memory_GB   Memory_GB_Peak  Disk_Percent    Disk_Percent_Peak   Disk_GB Disk_GB_Peak    CPU_Percent CPU_Percent_Peak
0   8.86    8.86    0.34    0.34    23.00   23.00   7.43    7.43    2.29    2.29
malachig commented 2 years ago

This seems to be working as expected. To activate monitoring one can simply add this to /shared/cromwell/workflow_options.json on the head Cromwell VM:

  "monitoring_script": "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"

Results for each task appear in the Google Bucket for each task result in a file named: monitoring.log