Closed vrenaul1 closed 2 months ago
@adamnovak would know better, but I'm pretty sure the error is related to the warning on the second line. A bad_alloc
error typically indicates that you were out of memory, which is exactly what that warning says might happen. As I understand it, the memory allocator that is built into vg
really can't work well if /proc/sys/vm/overcommit_memory
is set to 2. I would recommend talking to the admin of your server about changing the vm.overcommit_memory
setting.
If that's not possible, you won't be able to use the prebuilt releases, but it's possible to compile vg
manually with a different memory allocator. You can follow the standard build instructions, but when it's time to run make
, you add an extra argument like this:
make -j [nthreads] jemalloc=off
You should only do this if you really can't change the vm.overcommit_memory
setting though. Switching out the memory allocator will slow down vg
substantially.
I think @jeizenga is right here.
We have Docker images available of the snapshotted build result that you can use to just re-link without jemalloc, as part of a container build process.
With Singularity 3.11, I think you should be able to get a usable SIF file with:
cat >vg-nojemalloc.def <<'EOF'
Bootstrap: docker
From: quay.io/vgteam/vg:cache-v1.55.0-build
%post
cd /vg && make jemalloc=off
EOF
singularity build vg-nojemalloc.sif vg-nojemalloc.def
On older versions of Singularity you need root access or the administrator needs to have set up --fakeroot
to work to let you build containers.
Or if you can build Dockerfiles I think you could build with Docker like this:
cat >vg-nojemalloc.dockerfile <<'EOF'
FROM quay.io/vgteam/vg:cache-v1.55.0-build
RUN make jemalloc=off
EOF
docker build . -f vg-nojemalloc.dockerfile -t quay.io/vgteam/vg:v1.55.0-nojemalloc
@vrenaul1 I pushed up the container built from this as quay.io/adamnovak/vg:v1.55.0-nojemalloc
. Maybe try:
singularity run docker://quay.io/adamnovak/vg:v1.55.0-nojemalloc vg call -r ~/gg/hprc-v1.1-mc-grch38.snarls -s A -S GRCh38#0#chr1 -k ~/NA12878_gg/A.pack /mnt/beegfs/home/vrenaul1/gg/hprc-v1.1-mc-grch38.gbz > /mnt/beegfs/home/vrenaul1/NA12878_gg/A.vcf
Thanks @adamnovak @jeizenga for your input ! Unfortunately, the value in /proc/sys/vm/overcommit_memory can not be changed on the calculation cluster I am using. I was able though to build a singularity image by turning off jemalloc by following your recommandations @adamnovak. I am running the "vg call" command on a calculation cluster with the new singularity image. In parallel, I am running the same command on a local machine on which I can change the value of /proc/sys/vm/overcommit_memory (set to 0). In both instances, the running time is abnormally long ie it's been running for over 12 hours so far even though the input file is pretty small. The vcf file is still empty, the log file does not show any progress. I am not sure where my command below is potentially wrong: '''singularity run ~/graphGenomesTools_1.0.sif vg call -t 32 -r ~/graphGenomes/hprc-v1.1-mc-grch38.snarls -s A -k ~/NA12878.chr22/A.pack -S GRCh38 ~/graphGenomes/hprc-v1.1-mc-grch38.gbz > ~/NA12878.chr22/A.vcf'''.
Any ideas on how I can improve the running time ? Thanks for your help !
Usually adding -z
to vg call
in order to restrict the search to haplotypes in the GBZ will speed it up considerably.
Thanks @glennhickey for your feeback. My main purpose in using graph mapping / calling tools is to be able to detect delins that are difficult to detect using conventional NGS tools. If I understand correctly, adding "-z" will indeed speed up a lot (indeed the process finished in minutes) but I may misss these delins events if they do not belong to any haplotypes in the GBZ if I understand correctly ? I am working on cancer data so there are probably a lot of such events
If you don't want to restrict to the haplotypes, you can try -C
to limit the maximum allele size. Note: this option got fixed up in the latest vg release (v1.56.0), so make sure to be using that one.
Thanks @glennhickey for the information ! It does indeed speed up significantly the analysis.
1. What were you trying to do?
Simple variant calling on a small fastq file
2. What did you want to happen?
Have a vcf file generated
3. What actually happened?
vg call returned an error message as shown below
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen?
The following command was run:
The fastq file used is NA12878.chr22.sample.fq.gz extracted from https://zenodo.org/records/10794014/files/data.tar.gz?download=1.
6. What does running
vg version
say?