isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

segfault when building index only #50

Closed svm-zhang closed 7 years ago

svm-zhang commented 7 years ago

Hello Ivan,

I was trying to build the index for my simulated GRCH38 genome using the command line:

graphmap align -I -r grch38.simu.fasta

The index ran about 7 minutes before issued seg fault error (see the file attached below).

graphmap.index.err.txt

I also tried to use .fa as the reference extension and with/without reads file, and the error persisted.

I am using graphmap v0.3.2.

Any insights on how I could fix this?

Thanks, Simo

isovic commented 7 years ago

Hi Simo, Thanks for the report! How much RAM does your machine have? And is the file you're trying to index the size of the entire hg (~3.1Gbp)? Best regards, Ivan.

svm-zhang commented 7 years ago

Hello Ivan,

I was running it on an EC2 instance which has 16 CPUs and 30GB RAM. The segfault showed up when only 40% of the memory was used (according to htop).

The simulated reference contains only the autosomal part of the GRCH38 genome (sex, MT, and decoy sequences were not included).

Thanks for the help!

Simo

isovic commented 7 years ago

Hi Simo, could you by any chance re-run your test on the newest version (v0.4.1) and report whether it works now? It's hard to debug without the concrete dataset :-) If it still does not work, could you by any chance upload your reference somewhere so I could take a look myself?

Best regards, Ivan.

svm-zhang commented 7 years ago

Hello Ivan,

Thanks for looking into this. I pulled the patch and the error persists.

What would be the best way to share you with my reference?

Thanks, Simo

svm-zhang commented 7 years ago

Hello @isovic,

I just sent a google drive link to ivan.sovic@irb.hr. Please look out for that email and let me know if you get it.

Thanks, Simo

isovic commented 7 years ago

Hi Simo, thanks for the link! I got the email and will inspect today! Best regards, Ivan.

isovic commented 7 years ago

For some reason I don't have access to the file - I requested for it now via Gdrive. Will take a look as soon as you approve the access!

Thank you! Ivan

svm-zhang commented 7 years ago

Hello @isovic,

I just approved the access. Please give a check and let me know if it works.

Thanks, Simo

isovic commented 7 years ago

Ok, got it now, thanks! Will have a look. Ivan

mjoppich commented 7 years ago

The seg-fault also occurs when running graphmap align and it needs to build an index:

[16:03:06 Index] Running in normal (parsimonious) mode. Only one index will be used.
[16:03:06 Index] Index is not prebuilt. Generating index.
[16:03:06 LoadOrGenerate] Started generating new index from file 'ref/hg38.fa'...
Segmentation fault

I just

make debug
gdb --args ./bin/graphmap-debug align -r ref/hg38.fa -d all_2d.fastq -o aligned/all_2d.sam

and got the seg-fault here:

Using host libthread_db library "/lib64/libthread_db.so.1".
[16:25:28 Index] Running in normal (parsimonious) mode. Only one index will be used.
[16:25:28 Index] Index is not prebuilt. Generating index.
[16:25:28 LoadOrGenerate] Started generating new index from file 'ref/hg38.fa'...

Program received signal SIGSEGV, Segmentation fault.
0x00000000004d2429 in IndexSpacedHashFast::CreateIndex_ (this=0x890940, data=0x7ffdbdb02010 'N' <repeats 200 times>..., data_length=6176539712) at src/index/index_spaced_hash_fast.cc:520
520     kmer_hash_array_[hash_key][kmer_countdown[hash_key]] = coded_position;
Missing separate debuginfos, use: zypper install libgomp1-debuginfo-6.2.1+r239768-2.4.x86_64 libz1-debuginfo-1.2.8-6.3.1.x86_64
(gdb) where
#0  0x00000000004d2429 in IndexSpacedHashFast::CreateIndex_ (this=0x890940, data=0x7ffdbdb02010 'N' <repeats 200 times>..., data_length=6176539712) at src/index/index_spaced_hash_fast.cc:520
#1  0x00000000004ecb7c in Index::GenerateFromSequenceFile (this=0x890940, sequence_file=...) at src/index/index.cc:81
#2  0x00000000004ec73f in Index::GenerateFromFile (this=0x890940, sequence_file_path=...) at src/index/index.cc:47
#3  0x00000000004d5887 in IndexSpacedHashFast::LoadOrGenerate (this=0x890940, reference_path=..., out_index_path=..., verbose=true) at src/index/index_spaced_hash_fast.cc:1086
#4  0x0000000000540b1b in GraphMap::BuildIndex (this=0x7fffffffbf00, parameters=...) at src/graphmap/graphmap.cc:204
#5  0x000000000053e023 in GraphMap::Run (this=0x7fffffffbf00, parameters=...) at src/graphmap/graphmap.cc:39
#6  0x0000000000578e6b in main (argc=8, argv=0x7fffffffc138) at src/main.cc:70

Hope that helps :)

So it looks like you don't like Ns, right?

Markus

isovic commented 7 years ago

It helps a lot actually, thanks for the traceback! I was just running gdb on Simo's reference to get the same.

So it looks like you don't like Ns, right?

Haha no I don't :-) I'm skipping those. But before I never had trouble on hg, curious what's going on now. Need to refactor this one, as well as some other pieces of code.

Best regards, Ivan.

xyl012 commented 7 years ago

I am also getting the same exact problem with hg19, any updates would be greatly appreciated!

isovic commented 7 years ago

Hi all, thank you for reporting this and for your patience! I've re-implemented the entire index, and this issue should no longer occur in the future, but the fix will be included in the next release which is coming soon (within a week, hopefully), together with some new goodies such as speed improvements on larger references. Best regards, Ivan.

svm-zhang commented 7 years ago

Hello Ivan,

This is great news! Looking forward to this new release.

cheers, Simo

xyl012 commented 7 years ago

As excited as Simo for the update. Thank you ivan!

isovic commented 7 years ago

Hi everyone,

there have been many updates and changes, and in the latest version I (hopefully) addressed all of the above issues. Would you mind giving it a spin to verify if everything is well now?

Best regards, Ivan.

svm-zhang commented 7 years ago

Hello Ivan,

I am testing it now and will report shortly. Thanks very much for all the updates!

Simo

svm-zhang commented 7 years ago

The indexing went very smooth. All problems solved :)

A side note, I was running the indexing on EC2 instance which gave the original error. But yesterday I was running it on Google cloud (n1-standard-32) using the v0.4.1, and no seg fault was issued. Weird!

Any ideas?

Simo

isovic commented 7 years ago

Hi Simo, Thanks for the report! It was hard to pinpoint exactly what was going on, and it manifested mostly on larger references. This made it difficult to debug, and after a while, I simply decided to implement a new index with all these great new features. I would advise using the newest version of GraphMap instead of 0.4.x.

Best regards, Ivan.

svm-zhang commented 7 years ago

I'd say great decision on implementing the new index! Thanks!

Simo

svm-zhang commented 7 years ago

Hello @isovic, I am closing this thread. In case @windybasket finds new problem, he/she can reopen this.

Thanks!