Segmentation fault - code 11?

sagrudd commented 5 years ago

Hi folks - I am unable to get lordfast to work; compiling from source (git master) or installating from bioconda leads to a quick segfault - this is seen on both Centos and macOS. Any thoughts as to what I may be doing wrong? TIA

 ./lordfast/lordfast --search Homo_sapiens.GRCh37.75.dna.chromosome.20.fa --seq clive.chr20.fastq
[NOTE] number of threads: 1
[NOTE] (bwt_load) loading the index...

[NOTE] (bwt_load) index was loaded in 0.34 seconds (0.34 CPU seconds)
@HD     VN:1.5  SO:unsorted
@SQ     SN:20   LN:63025520
@PG     ID:lordfast     PN:lordfast     VN:0.0.10       CL:./lordfast/lordfast --search Homo_sapiens.GRCh37.75.dna.chromosome.20.fa --seq clive.chr20.fastq
Reading input... loaded 1837 reads in 0.16 seconds (0.16 CPU seconds)
        mapping... Segmentation fault

haghshenas commented 5 years ago

Hi Stephen,

Thanks for trying lordFAST and reporting this issue. Are reads and reference files publicly available? Can you please compile the code again using make log1 and run

./lordfast/lordfast --search Homo_sapiens.GRCh37.75.dna.chromosome.20.fa --seq clive.chr20.fastq 1> map.sam 2> map.err

Then, by looking map.err you can find the read that is crashing lordFAST.

sagrudd commented 5 years ago

Thanks @haghshenas -

lordFAST was recompiled with the logging switch and it fails on the first read ...

    mapping... >54714c40-7e59-4346-ba4e-3cd865b33f1a        len: 110450

Segmentation fault (core dumped)

I have tried slicing and dicing the input file to get a read that works (below) - no joy.

The reference genome is the publicly available reference human genome (http://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.chromosome.20.fa.gz) and the sequence reads are all in the public domain (https://github.com/nanoporetech/ONT-HG1). I have included the fastq entry for a single read at the bottom of the message ...

Thanks for the assistance - I am sure that there is something obvious that I am missing?

Reading input... loaded 10 reads in 0.00 seconds (0.00 CPU seconds) mapping... >6d1f16df-9bba-4ae7-bcc9-e1719ddc4ccc len: 2071 candidate 1: 63088873 63093014 + 680.000000 candidate 2: 94373399 94377540 + 6.000000 candidate 3: 7532227 7536368 + 6.000000 candidate 4: 10015356 10019497 + 5.000000 candidate 5: 105285498 105289639 + 5.000000 candidate 6: 68175249 68179390 + 5.000000 candidate 7: 58077053 58081194 + 5.000000 candidate 8: 36963208 36967349 + 4.000000 candidate 9: 41330947 41335088 + 4.000000 candidate 10: 32365588 32369729 + 4.000000 Segmentation fault (core dumped)

Reading input... loaded 100 reads in 0.01 seconds (0.01 CPU seconds) mapping... >976284a7-9277-4372-80c5-5b12ded26f7a len: 27313 candidate 1: 63065717 63120342 + 1572.000000 candidate 2: 82621825 82676450 + 33.000000 candidate 3: 119958696 120013321 + 24.000000 candidate 4: 112556873 112611498 + 24.000000 candidate 5: 112174491 112229116 + 23.000000 candidate 6: 3987698 4042323 + 22.000000 candidate 7: 61836632 61891257 + 22.000000 candidate 8: 30208178 30262803 + 22.000000 candidate 9: 83250024 83304649 + 22.000000 candidate 10: 75465819 75520444 + 22.000000 Segmentation fault (core dumped)

example_read.fastq.gz

haghshenas commented 5 years ago

Weird stuff! I was able to map the single read that you provided without a problem. Also I downloaded this fastq file and mapped all reads in it without a segfault.

Question for you... Did you generate the index using lordFAST?

sagrudd commented 5 years ago

Yes, the index was prepared using lordFAST as documented. Are any assumptions being made on amount of memory, processor extensions, ??? - I have been able to replicate this segfault on a 2018 MacMini i7 running OSX (32Gb RAM), Centos 7 / Fedora 29 / Ubuntu 19.04 on a older Xeon server with 196Gb RAM. Any recommendations as to the hardware that you are using?

haghshenas commented 5 years ago

Sorry I cannot reproduce this segfault error. As far as I know there should be no especial hardware requirement. I have tested it on both Linux and Mac systems. What is the version of your GCC and zlib? Also, are you familiar with gdb? I appreciate if you help me to spot the segmentation fault.

sagrudd commented 5 years ago

Sure - I'll get some more info your review - could you tell me which Linux you are using and the versions of GCC, zlib and others that work for your development - thanks

haghshenas commented 5 years ago

To answer your question, I was testing it on a Linux server with GCC 4.8.5 and zlib 1.2.7.

But actually, I got the segmentation fault on a Mac laptop with zlib 1.2.11 and GCC 4.8.5 (installed by miniconda). Is your GCC installed by conda?

Unfortunately, lldb does not give me the line number where it crashes (although I'm using -g during compilation). I will work on debugging with lldb more and get back to you.

Coaxecva commented 5 years ago

I got the same error with ONT data from GIAB on hg38:

[NOTE] number of threads: 1 [NOTE] (bwt_load) loading the index... [NOTE] (bwt_load) index was loaded in 49.73 seconds (4.13 CPU seconds) Reading input... loaded 4393 reads in 4.07 seconds (0.23 CPU seconds) mapping... Segmentation fault (core dumped)

haghshenas commented 5 years ago

Thanks @Coaxecva for reporting. Could you also send me a small set of reads that cause this? Also please let me know what reference you have are using.

Coaxecva commented 5 years ago

Hi @haghshenas,

I used GCA_000001405.15_GRCh38_no_alt_analysis_set.fna as a reference. And ONT reads from GIAB: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/Ultralong_OxfordNanopore/final/ultra-long-ont.fastq.gz

You can start run the fastq, then the error will appear right away. Thanks, Coax

sarde279 commented 4 years ago

Hi @haghshenas,

Same error for me as well, [NOTE] number of threads: 6 [NOTE] (bwt_load) loading the index... [NOTE] (bwt_load) index was loaded in 0.31 seconds (0.30 CPU seconds) Reading input... loaded 2 reads in 0.00 seconds (0.00 CPU seconds) mapping... Segmentation fault: 11

AG-Run commented 4 years ago

Hi I also get the same error in a large genome higher than 4.3Gb. Is it lordfast able to work with large genomes? I have Tb in read to map. The index is already created but in the mapping step it crashes and segfault is created

Thanks

vpc-ccg / lordfast

Segmentation fault - code 11? #16