Open mashehu opened 2 weeks ago
So the flow of information in the scdownstream pipeline looks as following:
gpu
profilegpu
profile prepares the containerization tools for GPU mounting and sets the hidden pipeline parameter use_gpu
to true -> can be used for if-clauses in workflowsprocess_gpu
label -> can be used to handle all GPU-enabled processes in a certain way. Different executors need different tweaking to handle tasks from these processes correctly. I added a bit of documentation here.ext
variable which reflects if they should use GPU or not. This can be useful if the same module supports usage both with and without GPU. Example: cellbenderThis implementation is the best I could come up with so far, but not many really "senior" people have had a look at it AFAIK. Maybe we can discuss which elements of the suggested approach should be part of the template and where we can perform some improvements.
Another thought: Should this not be an optional part of the template?
it's teeny-tiny enough, that we might keep it in, imo, similar to arm
... but also not 100% sure
But is the goal to only add the profile to the template?
I think it would at least be nice to have a common structure for all GPU-enabled modules, even if it only means adding the process_gpu
label
Thank you @nictru
- Users provide the
gpu
profile
The gpu
profile being added to the template does not have the use_gpu
configuration variable
. I can see that you have added it to the profile in the pipeline: https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L182
I don't think we need a separate configuration variable
.
- The
gpu
profile prepares the containerization tools for GPU mounting and sets the hidden pipeline parameteruse_gpu
to true -> can be used for if-clauses in workflows
In the scdownstream pipeline, use_gpu
is defined as a pipeline parameter
(https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L40) and a configuration variable
(https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L182). These two are different so setting gpu in the -profile
, will not set the use_gpu
pipeline parameter
. It will only set the configuration variable
.
I think we only need the use_gpu
pipeline parameter
and should get rid of the use_gpu
configuration variable
. The use_gpu
label in the base.config
file can be changed to,
process {
withLabel: 'use_gpu' {
ext.use_gpu = { params.use_gpu }
}
}
At pipeline execution, the user must set the gpu
profile and the use_gpu
pipeline parameter
to utilise the GPU-based tools.
- Processes with GPU support get the
process_gpu
label -> can be used to handle all GPU-enabled processes in a certain way. Different executors need different tweaking to handle tasks from these processes correctly. I added a bit of documentation here.
Thank you. This is very helpful.
- This section makes sure that processes have an
ext
variable which reflects if they should use GPU or not. This can be useful if the same module supports usage both with and without GPU. Example: cellbender
This is quite clever. Nice!
Hey @GallVp, thanks for the input.
Some notes:
It might sound stupid, but I did not know there was a difference between pipeline parameters
and configuration variables
. I agree that we should only have one.
I am not fully aware about what you can and can't do with configuration variables
. I also was not able to find any documentation about them at all. I only used the configuration variable
in scdownstream. Not sure if one can use configuration variables
for workflow if
clauses?
pipeline parameter
should be preferred, but it should still be set to true
by the gpu
profile and hidden from the parameter documentationconfiguration variable
is better, because it prevents users from using one without the otherAt pipeline execution, the user must set the
gpu
profile and theuse_gpu
pipeline parameter to utilise the GPU-based tools.
I would prefer if setting the gpu
profile would be sufficient. Keeping them separate will probably lead to a lot of users using one without the other, which almost certainly will lead to errors. I am not aware of any use cases where this would be an advantage, but I'm eager to hear.
Not sure if one can use
configuration variables
for workflowif
clauses?
No, configuration variables
are not available in the workflows. Maybe, there is a backdoor that I am not aware of.
How about we get rid of both the variable and the parameter? Instead, we modify the profile as,
gpu {
process {
withLabel: process_gpu {
ext.use_gpu = true
}
}
docker.runOptions = '-u $(id -u):$(id -g) --gpus all'
apptainer.runOptions = '--nv'
singularity.runOptions = '--nv'
}
The workflows can use workflow.profile.contains('gpu')
in place of the use_gpu
parameter.
The above solution, however, does not take care of the arm
profile. So, we need to have a gpu_arm
profile as well.
Hey, I just tested the suggested approach and it seems to work as expected.
I think we could do it like this, but I would prefer having the withLabel
block in the base.config
- just to make sure that all the withLabel
configs are collected in one place.
I did this using the following:
withLabel: process_gpu {
ext.use_gpu = {workflow.profile.contains('gpu')}
}
So that we don't need a config variable. But this is more a cosmetic topic and not a hill-to-die-on.
feel free to add the changes, @nictru
I did this using the following:
withLabel: process_gpu { ext.use_gpu = { workflow.profile.contains('gpu') } }
I agree. This is more in line with the existing infrastructure. Thank you!
Needs opt-in functionality in the nf-core pipelines create
. See also #3261
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Additional details and impacted files
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.