Closed arnikz closed 6 years ago
Must be implemented in Xenon first, see https://github.com/NLeSC/Xenon/issues/562
One of my jobs keeps failing on SGE due to memory requirements. From the log it seems that xenon passes mem_free
instead of h_vmem
parameter (as suggested above). The job used about 10G (maxvmem
) but was cancelled after ~3h on a node with 32G free memory. Why?
...
qsub_time Mon Mar 5 16:27:04 2018
start_time Mon Mar 5 17:43:35 2018
end_time Mon Mar 5 20:28:45 2018
granted_pe threaded
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 9910s
ru_utime 0.081s
ru_stime 0.121s
ru_maxrss 2.246KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 52580
ru_majflt 0
ru_nswap 0
ru_inblock 8
ru_oublock 176
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 680
ru_nivcsw 88
cpu 15831.750s
mem 32.009KGBs
io 875.730GB
iow 0.000s
maxvmem 10.009GB
arid undefined
ar_sub_time undefined
category -l h_rt=0,mem_free=32768M -pe threaded 1 -P compgen
It's weird that another call to the accounting system shows a different category line: -l h_rt=172800...
for the same job.
After discussion with @jmaassen we found that both mem_free
and h_vmem
must be set to the same value.
Fixed now in Xenon v2.6
In addition to
--max-run-time
, one would like to set memory requirements (MB): SGE:-l h_vmem
SLURM:--mem