nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 606 forks source link

Default resources documentation #4998

Closed jansen-ista closed 1 month ago

jansen-ista commented 1 month ago

Hello, I work as a bioinformatician at a big institute and recently had to run a nextflow pipeline on our HPC cluster.

While I was easily able to utilize your slurm executor to run the pipeline distributed over the cluster nodes, I kept running into issues due to jobs exceeding their resources. This was caused by our default slurm resources being to low for several smaller jobs, which didn’t have resources specified in the pipeline in question. Reading the nextflow documentation on configuration I found the hierarchy of how configurations are read. From this it seemed to me that if I add default resources such as process.memory and process.time to my config, that they would overwrite all other resoures specified in the main pipeline script(s). It was only after reading this article that I realised, this was indeed not the case.

I would therefore suggest to clarify the documentation in that regard and/or maybe add a paragraph on default resources.

Best regards and keep up the awesome work!

bentsherman commented 1 month ago

Could you be more specific about which part of the config docs are not clear?

This part does say that the process definition will override the generic process configuration: https://nextflow.io/docs/latest/config.html#selector-priority

jansen-ista commented 1 month ago

You are right I’ve overlooked that part. I was looking at the paragraph Configuration file. Since it says that parameters from main.nf (and thereby included .nf scripts) are applied first and then overwritten by the other config files including "5. Config file specified using the -c <config-file> option”, my thinking was that this would include definitions in the process scope. I guess with the correct behavious documented in the process scope, the only suggestion might be a reference/disclaimer in the Configuration file paragraph that this deviates for the process (and other?) scopes. At the same time I would also understand to skip this in terms of simplicity/readability.

Anyway thank you a lot for the clarification and pointing me to the correct paragraph.

bentsherman commented 1 month ago

I think we are going to move the enumeration of all the config settings to a separate page so that the more general information like the priority resolution isn't buried under tons of reference docs.

Also I think we will simplify the params by allowing them to be defined in fewer places (likely a separate schema file and config profiles) so that there isn't so much complexity with the params resolution