smehringer / SViper

Swipe your Structural Variants called on long (ONT/PacBio) reads with short exact (Illumina) reads.
BSD 3-Clause "New" or "Revised" License
32 stars 8 forks source link

encounter the memory allocation problem #21

Open shuliliu opened 3 years ago

shuliliu commented 3 years ago

Hi,

I currently try to run SViper to polish the SVs generated by cuteSV (have changed the vcf format) in both CLR pacBio reads and CCS PacBio reads. I successfully got the results in SVs from CLR PacBio reads, but failed (See below) after running for several minutes and got several polished SVs using SVs from CCS PacBio reads.

~/SViper/extern/seqan/include/seqan/basic/basic_exception.h:363 FAILED! (Uncaught exception of type std::bad_alloc: std::bad_alloc)

stack trace: 0 [0x41891e] ~/bin/SViper/build/sviper() 1 [0x407782] ~/bin/SViper/build/sviper() 2 [0x7ff21669e006] /soft/compiler/gcc/gcc-9.2.0/lib64/libstdc++.so.6(+0xad006) 3 [0x7ff21669d109] /soft/compiler/gcc/gcc-9.2.0/lib64/libstdc++.so.6(+0xac109) 4 [0x7ff21669da34] gxx_personality_v0 + 0x264 5 [0x7ff215e31463] /soft/compiler/gcc/gcc-9.2.0/lib64/libgcc_s.so.1(+0x10463) 6 [0x7ff215e31cc6] _Unwind_Resume + 0x126 7 [0x40895c] ~/bin/SViper/build/sviper() 8 [0x413e44] ~/bin/SViper/build/sviper() 9 [0x7ff21604b4a2] GOMP_parallel + 0x42 10 [0x4091da] ~/bin/SViper/build/sviper() 11 [0x7ff2158626a3] libc_start_main + 0xf3 12 [0x40976e] ~/bin/SViper/build/sviper()

/var/spool/slurmd/job634082/slurm_script: line 39: 2250216 Aborted (core dumped) sviper -s ${bam_sdir}/${bam_short} -l ${bam_ldir}/${bam_long} -r ${ref_dir}/hs37d5.fa -c ${wk_dir}/${out_dir}/reformat.${out_dir}_cuteSV.vcf -o ${wk_dir}/${out_dir}/${out_dir}.polished_variants -t 1 -x ${depth[index]} --output-polished-bam

Do you have any ideas where's the issue?

Thanks,

Shuli

smehringer commented 3 years ago

Hi @shuliliu,

thanks for your interest in SViper! So at a first glance this looks like a memory issue as you already mentioned. Multi-threading increases the memory consumption but I see that you already only use one thread.

What is your available main memory (RAM)? What is your short read coverage?

If possible, try running your script on a machine with more main memory and see if it also crashes. If the error persists it might also be a bug, but than I would need to debug SViper with your data. Is there a chance you can make them available to me?

Best, Svenja

shuliliu commented 3 years ago

Hi Svenja ,

Thank you for your reply. I used 4 coverage short reads (for testing) and 400G memory, and didn't work. Yesterday, I tried the long read bam file generated by pbmm2 instead of minimap2, and it worked. So the issue seems located in the long read bam file.
1) I checked these two bam file in detail and only saw two big differences: bam files for pbmm2: cigar string: X/= cigars instead of M bam files for minimap2: cigar string: M instead of X/= cigars. MD string: I added a MD string when running minimap2.

2) I also checked the bam file generated from CLR reads and CCS reads using (minimap2). and saw one difference: bam files from CLR reads: base quality is missing (Originally missing in fastq file); bam file from CCS reads: with base quality.

I don't know whether above would be the things for the issue. I'll send you the data later if you think it still need to check more.

Best,

Shuli

smehringer commented 3 years ago

Hi Shuli,

thanks for looking into it deeper. Yes your memory is sufficient big time, it looks like a bug.

To 1. I'm pretty sure that I handled X/= euqallly to M. And anyway I would suspect the error when handling X/=. To 2. This seems more promising. I took a quick look into the code and I did not explicitly handle missing base qualities so I need to dig deeper to check if there is a problem.

It has been some time since I have touched the code though, so I would appreciate to get the data because debugging is easier that way. If you know the region where the error happens I am happy to only take a subregion of the data, too.

Best, Svenja