LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf

nextflow-io / nextflow

A DSL for data-driven computational pipelines

http://nextflow.io

Apache License 2.0

2.75k stars 628 forks source link

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

Open d-callan opened 3 months ago

d-callan commented 3 months ago

Bug report

Expected behavior and actual behavior

Jobs submit on an LSF cluster should respect the value for LSF_UNIT_FOR_LIMITS in lsf.conf, per #1124 .. However, running on a cluster where this unit is set to MB, for a task asking for 80 MB, sees a header in .command.run files like the following:

#BSUB -M 81920
#BSUB -R "select[mem>=81920] rusage[mem=80]"

Steps to reproduce the problem

On an LSF cluster with a non-default setting for LSF_UNIT_FOR_LIMITS, i attempted to run an nf-core pipeline..

nextflow run nf-core/metatdenovo -profile singularity,test -outdir out

Program output

The cluster fails to start jobs, saying ive requested more resources than the queue allows.

Environment

Nextflow version: Ive tried 23.10.1 and 24.04.3
Java version: 11.0.1
Operating system: Linux
Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

d-callan commented 3 months ago

possibly crazy question though.. wondering if there is a way i can work around this in the meantime of a fix? im kind of stuck as things are.

d-callan commented 3 months ago

as i investigate more, it seems like this is due to some odd configuration on my cluster. i cant run nextflow directly on the head node, where the correct lsf.conf exists. and for whatever reason, the lsf.conf file on the worker nodes is not consistent w the head node. ive tried to ask the admins about it, and they are.... something less than helpful. i think id like to amend this ticket to a feature request:

to be able to explicitly override this unit

bentsherman commented 3 months ago

This LSF config setting is read here: https://github.com/nextflow-io/nextflow/blob/2fb5bc07f2ad1309c9743b8675bb8003892e3eb7/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy#L315-L320

And the memory options are defined here: https://github.com/nextflow-io/nextflow/blob/2fb5bc07f2ad1309c9743b8675bb8003892e3eb7/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy#L92-L103

So you can see how the various config options affect the final submit options. Maybe you can use the executor.perJobMemLimit or executor.perTaskReserve options to get what you need

d-callan commented 2 months ago

thanks @bentsherman for the info. i had another thought recently.. what do you think of explicitly adding units to the submission string? so that nextflow produces something like bsub -M 50000KB rather than bsub -M 50000? if doable, that seems like it should make this more robust, make my problem go away, and add clarity without changing existing behavior/ features?

bentsherman commented 2 months ago

I didn't realize that was an option. It would make things much simpler. Can a unit be specified for all of those memory settings?

d-callan commented 2 months ago

hmm. good question. ive just now gone and tried to ask for an interactive node on my cluster like bsub -M 4GB -R "select[mem>=8GB] rusage[mem=8GB]" -Is bash and nothing screamed at me or caught fire.. so that seems promising.

bentsherman commented 2 months ago

Okay I see it is documented here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=requirements-resource-requirement-strings#vnmbvn__title__3

Assuming this syntax has been supported for a while, it should be fine for Nextflow to use it. I will draft a PR