isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

segfault while indexing largish 16S database #6

Closed andreas-wilm closed 8 years ago

andreas-wilm commented 8 years ago

The reference file of interest is gg_13_5_otus/rep_set/99_otus.fasta, which comes with ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz It might be considered unusual in so far as it only contains short sequences (16S rRNA; shortest 1254 bp, longest 2368 bp) and all sequence ids are numeric (but unique)

Here's how to reproduce the segfault:

$ graphmap  -I -r 99_otus.fasta
[Index 22:42:34] Running in fast and sensitive mode. Two indexes will be used (double memory consumption).
[Index 22:42:34] Generating index.
[Index 22:44:51] Generating secondary index.
Segmentation fault (core dumped)

$ ll
total 6213492
lrwxrwxrwx 1 wilma csb5         96 Aug 19 22:42 99_otus.fasta -> /mnt/genomeDB/misc/greengenes.secondgenome.com/downloads/13_5/gg_13_5_otus/rep_set/99_otus.fasta
-rw-r--r-- 1 wilma csb5 5338573735 Aug 19 22:44 99_otus.fasta.gmidx

Here a backtrace:

$ gdb /mnt/software/bin/graphmap
(gdb) set args  -I -r 99_otus.fasta
(gdb) r
Starting program: /mnt/software/bin/graphmap -I -r 99_otus.fasta
[Thread debugging using libthread_db enabled]
[Index 22:58:01] Running in fast and sensitive mode. Two indexes will be used (double memory consumption).
[Index 22:58:01] Generating index.
[Index 23:00:13] Generating secondary index.

Program received signal SIGSEGV, Segmentation fault.
0x000000000047a449 in IndexSpacedHash::CreateIndex_(signed char*, unsigned long) ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64 libgomp-4.4.7-4.el6.x86_64
(gdb) bt
#0  0x000000000047a449 in IndexSpacedHash::CreateIndex_(signed char*, unsigned long) ()
#1  0x000000000047465b in Index::GenerateFromSequenceFile(SequenceFile const&) ()
#2  0x00000000004735c1 in Index::GenerateFromFile(std::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#3  0x000000000045937c in GraphMap::BuildIndex(ProgramParameters&) ()
#4  0x000000000045d7db in GraphMap::Run(ProgramParameters&) ()
#5  0x0000000000496a85 in main ()

This happens with release v0.21 and also commit 95b9dca

isovic commented 8 years ago

Hmm interesting. I cannot reproduce the bug. My output: graphmap -I -r 99_otus.fasta [Index 18:06:09] Running in fast and sensitive mode. Two indexes will be used (double memory consumption). [Index 18:06:09] Generating index. [Index 18:07:35] Generating secondary index. [Index 18:09:00] Index generated in 170.48 sec. [Index 18:09:00] Memory consumption: [currentRSS = 10764 MB, peakRSS = 10897 MB]

[Index 18:09:00] Finished generating index. Note: only index was generated due to selected program arguments.

Do you have the latest version pulled and compiled?

andreas-wilm commented 8 years ago

My fault! Worked on the cluster and had only asked for 8GB. Sorry

andreas-wilm commented 8 years ago

Then again, if these are out of memory problems they should be reported as such and not just result in a segfault... :)

isovic commented 8 years ago

Agreed. And they are handled almost everywhere. I guess you found one of the rare places where I failed to check the allocation output :D Will re-check.

isovic commented 8 years ago

I added some missing checks for memory allocation when generating index. Would you mind re-running the original test you made, to verify it worked?

andreas-wilm commented 8 years ago

Fixed in 3c64651:

[Fri, 21 Aug 15 02:03:30 +0000 FATAL] #1: Memory assertion failure. Possible cause - not enough memory or memory not allocated. When allocating all_kmers_. Requested size: 4612041688 bytes.
[Fri, 21 Aug 15 02:03:30 +0000 FATAL] Exiting.

Thanks!