Reconcile cpus/memory with clusterOptions for pbspro executor

bentsherman commented 3 years ago

I have a feature request and some discussion around the pbspro executor and how resource settings are determined from cpus, memory, and clusterOptions.

So we have a pipeline with the standard sort of resource settings:

# base.config
process {
  cpus = 1
  memory = 6.GB
  time = 24.h
}

And in the spirit of nf-core we are developing an institutional config file with settings for our PBS Pro scheduler. Now a wonky thing about our scheduler is that we must specify the interconnect, even if we don't care and just say "any". Our admins have been pretty adamant about this rule. So we put that here:

# palmetto.config
process {
    executor = "pbspro"
    clusterOptions = "-l select=1:interconnect=any"
}

These settings in combination produce the following kind of pbs headers:

#PBS -N nf-kallisto_DRX
#PBS -o [...]
#PBS -j oe
#PBS -l select=1:ncpus=4:mem=6144mb
#PBS -l walltime=24:00:00
#PBS -l select=1:interconnect=any

In this situation, PBS seems to ignore the second select line. It gets the right cpus, memory, and walltime, but it never includes the interconnect setting and so the job is rejected. (Actually the job is still accepted if ncpus=1, but I think that is also an artifact of our particular cluster.)

So now I have to null the cpus and memory directives and instead specify them in clusterOptions:

process {
    executor = "pbspro"
    cpus = null
    memory = null
    clusterOptions = "-l select=1:ncpus=1:mem=6gb:interconnect=any"
}

But at this point things get really hairy because I end up having to do the same kind of thing for any withLabel or withName rules in the pipeline config. The pipeline in question is working towards nf-core compatibility so they aren't going to maintain platform-specific profiles, and I don't think the sys admins are going to budge on this weird interconnect rule.

So I'm wondering if we can solve my problem in Nextflow? It seems to me that if cpus and/or memory are defined in addition to clusterOptions, then the resulting select lines should be merged like so:

#PBS -N nf-kallisto_DRX
#PBS -o [...]
#PBS -j oe
#PBS -l select=1:ncpus=4:mem=6144mb:interconnect=any
#PBS -l walltime=24:00:00

These headers would work for me. The time directive can be left alone because the walltime setting is not tied to the select. This is all just for the pbspro executor, although I imagine we could have this same discussion for pbs and really any other executor that supports clusterOptions.

bentsherman commented 3 years ago

Another downside of this situation is that when you specify cpus through clusterOptions instead of cpus, task.cpus does not accurately reflect the actual number of cpus and so you can't do multiprocessing if you have to provide task.cpus as a command-line argument in the task script.

bentsherman commented 3 years ago

To my surprise, my sys admins actually removed the interconnect requirement at my request, since "any" is a sensible default, so this issue is no longer urgent for me. That being said, I think it's still an interesting issue to consider so I'll leave it open.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pditommaso commented 1 year ago

Bump

nextflow-io / nextflow

Reconcile cpus/memory with clusterOptions for pbspro executor #2264