Closed tskir closed 1 week ago
As a reminder, the current worker VM type is n2d-highmem-4, with 4 cores and 32 GB of RAM. This is how they were used:
There is also a family of “ultramem” VMs which provide a lot of RAM per one CPU core. I will also briefly look into them to see if this can be a good, cost-effective option.
In the v5 run, the run limit for a single job was 3600s. I initially suspected that some jobs failed due to the time limit (it was difficult to tell because RAM and time failures don't provide any specific log entries.)
In the v6 run, the run limit was raised to 7200s. No jobs failed due to the time limit. However, upon investigating the benchmarking logs, the longest job in the v6 run took 1911s total, so this doesn't appear to be an issue.
Note to self: see also Storage Class A operations, currently around 350 per row in each run; this can be optimised.
This issue is a part of the https://github.com/opentargets/issues/issues/3302 epic.
The goal of this issue to configure VM types and task submission parameters so that tasks don't fail due to RAM constraints, but at the same time resources are not wasted.