Closed gri11 closed 1 year ago
you can try to set lower memory for Java in conf/parameters.conf
around lin 137:
JAVA_Xmx = "-Xmx32G"
Hi @riederd, I run the pipeline based on a larger dataset (Normal ~12G, Tumor ~15G) and use the same spec settings as @gri11 and set JAVA_Xmx = "-Xmx32G"
as you said but run into the same problem: 247
(Command exit status). (Computer spec: 128GBs RAM.)
bash .command.run
can be implemented successfully in the work dir where the error occurred but seems like when I run nextflow run nextNEOpi-master/nextNEOpi.nf ... -resume
again, it does not escape the make_ubam
step and still run into the same problem.
Do you have any advice for this? Thanks in advance!
All the best, He
@fredsamhaak Can you post the contents of the work dir as tar.gz?
Are you running the pipeline on slurm? If yes, are there any memory limits set?
Hi, I am not running the pipeline on slurm. By the way, I want to use nohup *.sh &
to run the pipeline but it will stop soon (but that's another question, maybe we could talk it later) so I just simply run it to first check if everything will be ok with a larger dataset.
Here is the work dir: work.tar.gz
The *unaligned.bam file is too large so I exclude it:
--
Update:
I try the pipeline again and use nextflow run nextNEOpi-master/nextNEOpi.nf ... -profile singularity -resume
this time (instead of -profile singularity,cluster
).
During the running process, I use top
to check the memory used by this step, I find that more than ~125G is used and the pipeline shows that:
I check work/88/32d24ebff687352f3bb6f61377b315
, the process runs successfully:
These are contents of a work directory in which the process was running successfully. I'd need the ones where it failed with 247.
Can you post an output of top so that I can see which process is using 125GB of memory? It should not happen that a single process would need that much.
Hi, here is the work dir: work.247.tar.gz
I also attach the 137
work dir since seems like this error also occurred when the memory reach ~125GB:
--
And here are the screenshots of the output of top
:
1:
2:
Thanks for the info:
can you try to lower the memory settings for sambamba inconf/params.config
around line 146:
SB_sort_mem = "16G"
Thanks @riederd, it works (make_uBAM step done successfully). But it is a little bit "dangerous" because the used
memory seems like nearly reached 125G (checked by top
). Does the parameter JAVA_Xmx
matters? And is it ok to set JAVA_Xmx -Xmx 16G
?
I attach the nextNEOpi_trace.txt
(I think maybe you need this file to check the memory used by each step?) and would you please explain the last four columns regarding to memory? Thank you~
nextNEOpi_trace.txt
All the best, He
Hi, I'm glad it worked.
the JAVA_Xmx
sets the maximum amount or memory the java virtual machine (basically the java process) can consume. You can try to lower it, I guess 16G
might work but it might increase the compute time for some processes.
As for the last four columns of the trace file:
peak_rss
: Peak of real memory (Resident Set Size), basically the real memory used by the process
peak_vmem
: Peak of virtual memory (Virtual Memory), the total amount of memory used by a process is the virtual memory (vmem). The vmem contains all memory areas whether they are in the physical memory (RAM), in the Swap space, on the disk or shared with other processes.
rchar
: Number of bytes the process read
wchar
: Number of bytes the process wrote
see also: https://www.nextflow.io/docs/latest/metrics.html#memory-usage https://nextflow.io/docs/edge/tracing.html#trace-report
You can also limit the maximum number of processes the pipeline is running in parallel, so that the different processes are not consuming all of your memory. Use the maxForks
setting of nextflow to do so, e.g. in
conf/process.config
around line 10 add:
maxForks = 1
or whatever number is good for you.
see also: https://www.nextflow.io/docs/latest/process.html#maxforks
If this answers your questions, you I may close this issue, right?
Thanks for your detailed explanation.
So from the nextNEOpi_trace.txt
I attached before, is it because the memory used by the four programs framed in red box exceeded more than ~128G (138.7 = 40.4 + 40.4 + 28.9 + 29) so that the pipeline failed with 247?
I will set maxForks = 1
to run the pipeline sequentially and check if it works (and also the running time.)
Yeah looks like, but you need to sum up the peak_rss
values, and I guess there are also other processes running on the system that need some memory.
I think setting maxForks = 1
globally will prevent running out of memory on your system, however it won't be needed for all the processes in the pipeline. You can finetune this in conf/process.config
and limit only the processes that consume a large amount of memory, e.g.:
....
withName:Bwa {
maxForks = 1
}
withName:Neofuse {
maxForks = 1
}
....
and as discussed above lowering the memory settings for java and sambamba in conf/params.conf
should help as well, and is probably sufficient e.g.:
JAVA_Xmx = "-Xmx16G"
SB_sort_mem = "16G"
That's great! I will try setting maxForks =1
just for Bwa
and make_uBAM
. And maybe I will also try lowering the memory settings for java and sambamba.
Thanks for your patience and detailed explanation.
Have a nice day! He
Hi @riederd, sorry for the late response since I just came back from vacation.
Setting maxForks = 1
for Bwa
and make_uBAM
works for me, but I ran into another problem: step Neofuse
has run for more than 8 days and it is still running. Here is the log file:
nextflow.log
Thanks again!
At first, I can run with testdata provided on this repo, but when use with my data (Normal ~12 GB, Tumor ~60 GB) got this error in process
make_uBAM
.Is there any way to fix this error? Computer Spec:
Config: