theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Check cg-pipeline CPUs used #490

Closed andrewjpage closed 3 weeks ago

andrewjpage commented 1 month ago

:bug:

:pencil: Describe the Issue

The cg-pipeline task doesnt take CPUs as input. 4 are passed in. Does it use all available?

Pass in the set number of CPUs in the task. Run the task and figure out how much RAM is normally used (/usr/bin/time -v cmd). Does it really need 8GB? Does it require a 100GB local disk? How much processing time is used and calculate the utilisation of the CPUs? If its not making use of all 4, adjust the task to use less.

This task is very short. Make it preemptible (spot) so that we can access lower pricing. Google give 30 seconds notice before killing it, so we shouldn't notice any difference.

In the runtime section of the task set:

maxRetries: 3
preemptible: 1