Closed tsibley closed 2 years ago
Discussion during triage is that, barring a robust way to detect limitations on CPU time, there's likely not a way to address this inside Augur itself. Closing as can't fix for now. We can use the issue to collect references/examples of the problem cropping up (e.g. like the monkeypox workflow fix above). And re-open if we ever have a way to fix.
Current Behavior
On the host:
In a CPU-limited container:
One would naively expect from the
--cpus 4
option the output to be4 4 4
.This discrepancy spotted in actual usage by @corneliusroemer in the context of an AWS Batch job with
--cpus 8
detecting 96 cores for tree building. My diagnosis was:Expected behavior
Not sure. I think that due to the way container CPU time is limited (instead of CPU cores), the numbers aren't technically wrong. But they're certainly wrong in the sense that they will likely lead to oversubscribed processes and CPU contention.
Possible solution
Not sure about general solutions.
Workflows should, whenever possible, explicitly pass around CPU counts instead of relying on auto-detection (and if auto-detection is used, only use it at the very outermost workflow layer).
Related issues