Closed AndreaGuarracino closed 2 years ago
Those types of errors are typically caused by the application requesting memory and the operating system refusing the request. This could be either because the machine is out of memory (both physical RAM and swap) or because the process hit some policy limit. The ulimit -a
command will show policy limits. Other than watching top
as it runs, I don't know of any out-of-the-box way of monitoring memory usage. On the bright side, it looks like it ran in 40 minutes.
I sadly don't know this step intimately enough to offer any parameter tweaks (and everyone else is still warm and cozy in bed).
Is this public data? Can you point to it, if only to give us something else to run on?
Hi @brianwalenz, thank you for your quick reply.
Unfortunately, the data is not yet public. The ulimit -a
output seems fine:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1031389
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1031389
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I am not sure if the overcommit handling mode could be the guilty here:
cat /proc/sys/vm/overcommit_memory
2
Indeed (from here):
2 - Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable amount (default is 50%) of physical RAM.
Depending on the amount you use, in most situations
this means a process will not be killed while accessing
pages but will receive errors on memory allocation as
appropriate.
However, as soon as possible, I will retry the same command line (all default) with the same data on a machine with ~2/4X more RAM, and I will let you know the result.
As for monitoring the memory usage, besides [h]top
ping, I would suggest updating verkko
in a way that it runs the commands/scripts by adding \time -v
(or /usr/bin/time -v
or similar) at the beginning of the command lines. In that way, you would get the Maximum resident set size (kbytes)
information also in case of failing executions. Perhaps, snakemake
has something like this too.
Were you able to test this on a larger node?
Hi @skoren,
I haven't been able to run the same test on a larger node yet (the other cluster doesn't like snakemake
).
Since the node where I did the first test now has more RAM available (~ 210GB), what I did was to update verkko
to the 1633e6e5a07202e64ff48c3866ab0fc05c308d7d
commit, delete the 1-buildGraph/
folder and re-run the same command line, that is verkko -d shr_verkko --hifi m64247_210428_035639.ccs.fq.gz --nano fastq.tar.gz --threads 48
.
The creation of the graph was successful and now it is still graph-aligning ONT reads against the 2-processGraph/unitig-unrolled-hifi-resolved.gfa
graph. By htop
ping during graph building, I didn't see big a memory consumption, so I am not sure why the first time the process died. The problem solved itself.
OK, I'll close this then but feel free to open a new issue if you encounter other errors or errors on other samples.
Hi, I am trying your promising pipeline with Rattus rattus HiFi and ONT data. I've got the following error during graph building:
Is it a memory problem? The machine has ~160GB of free RAM available.
This is the directory content:
I attach also the logs, in case it can help: 2022-01-24T030309.584137.snakemake.log buildGraph.err.txt