Closed odoublewen closed 7 years ago
Reading more about gridengine, I realize that virtual_free
and h_vmem
are doing different things.
virtual_free
is advice to the scheduler (which node to place the job on) while h_vmem
is setting a hard limit on the memory usage.
So I can understand that nextflow doesn't need to support h_vmem
... the workaround (above) seems sufficient.
Close this issue if you agree... thanks.
One more observation:
While this:
process my_process {
memory 16.GB
clusterOptions = "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}"
....
}
results in -l h_vmem=16G
in my .command.run file.
...It breaks when I try to dynamically set memory, like this:
process my_process {
memory {16.GB * task.attempt}
clusterOptions = "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}"
...
}
this results in -l h_vmem=_nf_script_033fec75$_run_closure1$_closure8@7e3060d8
in my .command.run file.
Seems that by multiplying the memory by task.attempt, the closure has somehow changed the memory object.
you need to reference it as task.memory
Keep in mind that consumables like h_vmem
are per-slot, and that slots are used as a mechanism to allocate cores. A 4 core job what wants 40GB should set h_vmem
to 10GB.
virtual_free
, on the other hand, simply checks how much free VM exists on a machine at placement time. That same job would ask for virtual_free
of 40GB.
You can't use the same value for both characteristics unless you're asking for a single slot.
@pditommaso -- Hmmm, I get ERROR ~ No such variable: task
when I try
clusterOptions = "-l h_vmem=${task.memory.toString().replaceAll(/[\sB]/,'')}"
@hartzell -- yes, I was planning on dividing the number by the number of requested slots (accessed via the cpus
nextflow directive)... but trying to get this working first... baby steps...
There's a glitch in the syntax, =
must be used only in the config file not in the nextflow script. It should be like the following:
process my_process {
memory {16.GB * task.attempt}
clusterOptions "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}"
...
}
Oops, thanks for this. I will add this info to the google group to provide reference for others.
At one point I noticed the = sign, but since it was working (without ${variables}
) I just assumed it was optional.
All working now! Thanks!
Hi, I'd like to try this, but I don't really use nextflow apart from running one pipeline (distiller) that was made by other people. Can someone please show how to add the division by number of cores to this syntax?
Use the discussion forum please https://groups.google.com/forum/#!forum/nextflow
Hi, when I use @pditommaso 's approach, removing the '=' sign,
process my_process {
memory {16.GB * task.attempt}
clusterOptions "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}"
...
}
I still get a similar error to this:
-l h_vmem=_nf_script_033fec75$_run_closure1$_closure8@7e3060d8
in .command.run
. Any clue?
Thanks!
Please a replicable test case in a separate issue, please.
As @pditommaso mentioned, it should be task.memory
. So in the end
process my_process {
memory {16.GB * task.attempt}
clusterOptions "-l h_vmem=${task.memory.toString().replaceAll(/[\sB]/,'')}"
...
}
solved the issue
Hello @pditommaso, I'm working in a cluster with the same situation above described (h_vmem instead of virtual_free). Using the above fix worked with my old configuration files for a DSL 1 workflow. However, since I've moved the workflow to DSL2, the fix stopped working. By instance, this configuration file:
executor{
name = "uge"
queueSize = 500
cpu = 1
memory = 8.GB
time = 23.h
}
process {
beforeScript = """
. /etc/profile.d/modules.sh
module load anaconda/5.3.1
sleep 2;
"""
penv = "sharedmem"
cpus = 1
memory = 8.GB
time = 6.h
clusterOptions = "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}"
errorStrategy = { task.exitStatus in [143,137,104,134,139,140] ? 'retry' : 'terminate' }
maxRetries = 5
maxErrors = '-1'
withLabel: small{
cpus = 1
memory = { 4.GB * task.attempt }
time = {6.h * task.attempt }
}
withLabel: medium{
cpus = 1
memory = { 16.GB * task.attempt }
time = { 12.h * task.attempt }
}
withLabel: large{
cpus = 1
memory = { 32.GB * task.attempt }
time = { 23.h * task.attempt }
}
withLabel: long{
cpus = 1
memory = { 128.GB * task.attempt }
time = { 96.h * task.attempt }
}
withLabel: small_multi{
cpus = { 2 * task.attempt }
memory = { 8.GB * task.attempt }
time = { 4.h * task.attempt }
}
}
Should give 128Gb of memory per core using the long configuration. However, when I look at the .command.run
file header, i see something like that:
#!/bin/bash
#$ -wd /PATH/TO/work/a1/ee5455196b900f37fe94721da68994
#$ -N nf-ROH_roh_(ROH)
#$ -o /PATH/TO/work/a1/ee5455196b900f37fe94721da68994/.command.log
#$ -j y
#$ -terse
#$ -notify
#$ -pe sharedmem 1
#$ -l h_rt=96:00:00
#$ -l h_rss=131072M,mem_free=131072M
#$ -l h_vmem=8G
The workflow uses the correct memory specification for the "normal" configuration (mem_free), but it uses the generic configuration when it comes to h_vmem, preventing my works to finish correctly. I should be able to hard code resources into the single processes, but that would limit the flexibility of the workflow itself. Is there some solution? Am i placing the configuration in the wrong place? I've tried to place clusterOptions into each label configuration, but it didn't work, complaining that it could't find process.memory. Not sure how to proceed to fix it.
Thank you in advance for your help Andrea
You should make the cluster option evaluated dynamically on the actual task memory value, therefore you should use the { }
syntax eg
clusterOptions = { "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}" }
@pditommaso just to get it right, something like this:
withLabel: small{
cpus = 1
memory = { 4.GB * task.attempt }
time = {6.h * task.attempt }
clusterOptions = { "-l h_vmem=${memory.toString().replaceAll(/[\sB]/,'')}" }
}
Am I correct?
Sorry task.memory
not memory
withLabel: small{
cpus = 1
memory = { 4.GB * task.attempt }
time = {6.h * task.attempt }
clusterOptions = { "-l h_vmem=${task.memory.toString().replaceAll(/[\sB]/,'')}" }
}
Great, I'll try that later today! Thank you!
From: Paolo Di Tommaso notifications@github.com Sent: Thursday, December 24, 2020 11:17:53 AM To: nextflow-io/nextflow nextflow@noreply.github.com Cc: RenzoTale88 talent88@hotmail.it; Comment comment@noreply.github.com Subject: Re: [nextflow-io/nextflow] process memory: some (many?) sge clusters use h_vmem, not virtual_free (#332)
Sorry task.memory not memory
withLabel: small{ cpus = 1 memory = { 4.GB task.attempt } time = {6.h task.attempt } clusterOptions = { "-l h_vmem=${task.memory.toString().replaceAll(/[\sB]/,'')}" } }
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/332#issuecomment-750836301, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRTPKFB4W6RHJTFJ4BSKF3SWMINDANCNFSM4DKAL4FQ.
@pditommaso it worked perfectly, thank you!
When using the sge executor, setting
memory 16.GB
in a process results in-l virtual_free=16G
appearing the header of the.command.run
file.Some sge clusters don't pay attention to this, and instead use
h_vmem
(I'm not sure how common/uncommon this is!)Of course, one can use
clusterOptions '-l h_vmem=16G'
but then one can't take advantage of the retry mechanism afforded by dynamic computing resources.Could the way that sge interprets the
memory
directive be made configurable?PS... I can use this as a workaround, but it's ugly: