Open tskir opened 1 month ago
Each Batch job spins up 100s of worker VMs. While it is possible to pick a few at random and monitor them manually, it is useful to be able to look at all of them at once.
I wrote a script which monitors a Batch job run and calculates:
I expect this script will be useful for other large Bath runs, not just for finemapping. I'll continue to use and will expand it in the future as needed. For example, we can add notifications so that as soon as jobs start failing, so that this can be investigated as soon as possible.
The script can be found here: https://github.com/tskir/batch-minimal-example/blob/16c21c4eb13f74cfe7c43f982ebf0f4127c5508b/monitor.sh
This issue is a part of the https://github.com/opentargets/issues/issues/3302 epic.
The goal of this issue is to be able to track the progress of a given Batch run, get accurate estimates of costs, and monitor RAM usage.