Open nick-youngblut opened 3 months ago
Also, it appears that since cellranger count
is called from within the cellranger_count.py
job, a 137 exit status (lack of memory) for the cellranger count
job will be "reported" by the cellranger_count.py
job as just an exit status of 1.
This is important for retrying processes with escalated resources:
errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
...since exit values of 1 will not trigger a retry.
Thanks for raising the issue, agree stderr/stdout and exit code should be forwarded.
This should be fixed at the nf-core/modules
level and likely also affects the spaceranger and cellranger multi modules that share the python script.
I will follow up on this eventually, but I have only very limited time I can put into nf-core at the moment -- so if you want to speed it up a PR to modules would be appreciated :)
@grst do you know if cellranger count
actually returns at 137 exit if there is a lack of memory for the job?
I am using -process.scratch ram-disk
, which requires more memory for the job, but the current release of the cellranger-count
nf-core module just returns an exit of 1, so the job will never retry with more memory:
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 72.GB * task.attempt, 'memory' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }
}
I tried updating the cellranger_count.py
template:
...but the cellranger count
process in scrnaseq
still returns an exit status of 1 when there is insufficient memory.
No idea. But it shouldn't be hard to capture the exit code from the subprocess call and then do sys.exit(exitcode)
in Python.
Description of feature
cellranger_count.py currently just uses
subprocess.run
for runningcellranger count
, but it does not capture and write out the subprocess stdout and stderr, so all that is returned to the user during a failed job is:It would be helpful if stderr and stdout were captured and returned. For example:
An alternative: