nextstrain / cli

The Nextstrain command-line interface (CLI)—a program called nextstrain—which aims to provide a consistent way to run and visualize pathogen builds and access Nextstrain components like Augur and Auspice across computing environments such as Docker, Conda, and AWS Batch.
https://docs.nextstrain.org/projects/cli/
MIT License
27 stars 20 forks source link

[build] Snakemake will auto-detect an incorrect number of CPUs on AWS Batch when `--cpus` is not used #175

Open tsibley opened 2 years ago

tsibley commented 2 years ago

Current Behavior

In the absence of a --cpus option provided to nextstrain build, the AWS Batch runner uses the job definition's default CPU allocation (vcpus) which is not detectable by Snakemake. Instead, Snakemake detects and uses the number of CPUs on the entire Batch node.

I expect this will cause oversubscription for the actual CPU time available to the Batch job, but it won't always be so because Snakemake is dealing in the number of processes active at one time within the container and the Batch job's CPU limitation is dealing in cumulative CPU time spent in a period (expressed relative to a single pegged CPU in that same period) across the whole container process tree. So I think oversubscription will mostly be an issue when all the Snakemake processes peg their CPUs at the same time (e.g. alignment, tree building).

Expected behavior

Snakemake is limited to the same number of CPUs that are nominally provisioned by the Batch job definition when --cpus isn't provided.

How to reproduce

Run this Snakefile under various conditions on AWS Batch and observe that workflow.cores reports the number of CPUs of the underlying Batch node if you pass --cores all.

import os

rule:
    shell: f"""
        echo workflow.cores {workflow.cores}
        echo nproc "$(nproc)"
        echo os.sched_getaffinity {len(os.sched_getaffinity(0))}
        echo os.cpu_count {os.cpu_count()}
        env | grep -i threads || true
    """

Possible solution

  1. Explicitly inform Snakemake of the Batch job description's CPU allocation (vcpus), since that's what will apply in the absence of --cpus. Downside is this breaks boundaries between the build command and the Batch runner: the former is what informs Snakemake but the latter is what knows about Batch job definitions.
  2. Require --cpus for the AWS Batch runner. Downside is this is a breaking change and erodes the runtime portability properties of nextstrain build's interface.

Workaround

Always specify --cpus for nextstrain build when using AWS Batch, as this both reserves CPU time for the Batch job and informs Snakemake of the correct --cores number.

Additional context

Realized this was happening when implementing #179.

Solution possibly related to https://github.com/nextstrain/cli/issues/144. It at least will touch the same code.

tsibley commented 1 year ago

I expect this will cause oversubscription for the actual CPU time available to the Batch job, but it won't always be so because Snakemake is dealing in the number of processes active at one time within the container and the Batch job's CPU limitation is dealing in cumulative CPU time spent in a period…

Relevant for thinking about this mismatch: https://danluu.com/cgroup-throttling/

(Dan Luu's analyses/write-ups are generally great, and I'd seen and read this one before, but it came back up again for me over the weekend, and I recalled that it applied to this issue.)

jameshadfield commented 1 year ago

Require --cpus for the AWS Batch runner. Downside is this is a breaking change and erodes the runtime portability properties of nextstrain build's interface.

This is my preference

tsibley commented 1 year ago

Hmm. Even if the other solution:

Explicitly inform Snakemake of the Batch job description's CPU allocation (vcpus), since that's what will apply in the absence of --cpus. Downside is this breaks boundaries between the build command and the Batch runner: the former is what informs Snakemake but the latter is what knows about Batch job definitions.

is possible without breaking boundaries (i.e. by redrawing boundaries)?

I'd be inclined to redraw the boundaries so that we can cleanly use the default from Batch for --cpus.