Closed joverlee521 closed 5 months ago
Last nights run finished under 12h 🎉 I'm going to dig into the logs a little bit, but at least this is an improvement so I'm merging this ahead of today's run.
We could probably scrap those CPU limits altogether. All they do is make snakemake restrict the number of jobs run in parallel.
This has obvious downsides and only rare benefits.
The CPU scheduler figures out ways to give all jobs some share when oversubscribed. We don't really need snakemake to do pessimistic scheduling.
We could probably scrap those CPU limits altogether.
For the aws batch runtime, the --cpus
option is used to override the default nextstrain-job definition, which is only 4 cpus.
The CPU scheduler figures out ways to give all jobs some share when oversubscribed.
Yes, progress will still be made with oversubscription, but the run time of the whole workflow will increase, sometimes substantially, depending on the kind of workload. It's still better not to oversubscribe when you can avoid it.
I've been only bumping the memory but not the CPUs for the fetch-and-ingest workflows. Might as well use all the compute that we are paying for. GenBank should be using c5.9xlarge and GISAID should be using c5.12xlarge, so bumping CPUs to match the instances.¹
Maybe this will magically help https://github.com/nextstrain/ncov-ingest/issues/446?
¹ https://aws.amazon.com/ec2/instance-types/c5/
Checklist