guanchangge / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

MosaikSort segmentation faults during "serialize alignments" phase #5

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. download ftp://149.155.100.41/pub/brassica/Brassica_95k_EST_assembly.fasta
2. download ftp://149.155.100.41/pub/brassica/TN_Solexa/N.fq.gz
3. run these commands
> MosaikBuild -fr Brassica_95k_EST_assembly.fasta -oa brassica_95k.dat -sn
Brassica -uri
"http://brassica.bbsrc.ac.uk/cgi-bin/gbrowse/jic_brassica/?name="
-assignQual 40 -st sanger
> MosaikBuild -q N.fq.gz -out n_solexa.dat -st illumina -p N_
> MosaikAligner -ia brassica_95k.dat -in n_solexa.dat -out n_aligned.dat -m
unique -hs 15 -mm 4 -act 20 -bw 13 -p 8 -rur n_unaligned.fastq
> MosaikSort -in n_aligned.dat -out n_sorted.dat

What is the expected output? What do you see instead?

Produces the following output :

------------------------------------------------------------------------------
MosaikSort 1.0.1307                                                 2009-10-14
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- phase 1 of 2: serialize alignments:
 0% [                                                                     
                ]                                  |Segmentation fault

What version of the product are you using?

Mosaik 1.0 release

On what operating system?

redhat Linux 2.6.18-164.2.1.el5 x86_64

Please provide any additional information below.

Original issue reported on code.google.com by tou...@gmail.com on 22 Oct 2009 at 12:21

GoogleCodeExporter commented 8 years ago
Thank you for the detailed bug report - this really makes it easy for me to 
reproduce
the error. 

I'm currently downloading the reference and reads from your links. When 
everything
has been downloaded, I'll take a look at what might be causing the segmentation 
fault.

My gut instinct is that perhaps some of the reads might be longer than 65,535 
bp long
(noticed you were aligning Sanger capillary). This should never happen with 
normal
sequencing technologies, but if a FASTA/FASTQ file was put together with BACs or
assembly contigs this might happen. I won't know the true cause until the files 
are
finished downloading.

// Michael

Original comment by snowneb...@gmail.com on 22 Oct 2009 at 12:56

GoogleCodeExporter commented 8 years ago
Thanks for taking the time to reproduce my problem.

Some other people seem to have had the same error, see following thread on 
seqanswers:
http://seqanswers.com/forums/archive/index.php/t-1347.html
You might want to post a comment.

I checked the sanger sequences and the longest is less than 4000bp.
In fact these are est contig consensi, which i'd like to use as sequence 
references.
There are 95000 of them, maybe it is too much for Mosaik ?

Hope you'll find the problem,

jorge.

Original comment by tou...@gmail.com on 22 Oct 2009 at 3:17

GoogleCodeExporter commented 8 years ago
Hi Jorge,

Still downloading the reads, but now I have the reference. MOSAIK is supposed to
handle up to 4 billion reference sequences (unsigned int), but there may still 
be
some old code that may have limited the number of references to 65,535 (unsigned
short). I'll know more when everything has been downloaded.

// Michael

Original comment by snowneb...@gmail.com on 22 Oct 2009 at 3:35

GoogleCodeExporter commented 8 years ago
Hi Jorge,

I just ran through the steps and encountered a floating point exception in 
MosaikSort
about 20% into the alignment archive. I'll take a look at what's causing that.

Also I have some tips for you:

1. When creating an reference archive in MosaikBuild, the -st parameter isn't 
needed.

2. In MosaikAligner, you can get a serious boost in alignment speed by adding 
-mhp
100 to the command line (100 works well when using a hash size of 15). I was 
using 12
processors on our test machine and aligned all 21 million reads in 6 minutes 7
seconds (59,734.5 reads/s).

Cheers,

// Michael

Original comment by snowneb...@gmail.com on 22 Oct 2009 at 6:39

GoogleCodeExporter commented 8 years ago
Hi Jorge,

The problem is now fixed. The variable for tracking the number of reference 
sequences
in the AlignmentReader class was accidentally set to an unsigned short (max of 
65,535
references). Now it's fixed and you can add as many references as you want (max 
of 4
billion).

An update of MOSAIK will be up on the site in a couple of days, until then you 
can
get the fix through subversion or you can get the 64-bit linux binary from the
following link:

http://bioinformatics.bc.edu/~mikaels/Mosaik/Mosaik-1.0-Linux-x64.tar.bz2

Let me know if the fix works for you.

------------------------------------------------------------------------------
MosaikSort 1.0.1325                                                 2009-10-22
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- phase 1 of 2: serialize alignments:
100%[========================================================] 268,980.4 
reads/s    
   in 54 s

- phase 2 of 2: restitch serialized alignments:
100%[===================================================] 354,356.0 
alignments/s    
   in 24 s

Single-end read statistics:
======================================================
                     reads              alignments
------------------------------------------------------
# non-unique:   6084904 (41.7 %)    12169808 (58.9 %)
# unique:       8508088 (58.3 %)     8508088 (41.1 %)
------------------------------------------------------
total:         14592992             20677896

Cheers,

// Michael

Original comment by snowneb...@gmail.com on 23 Oct 2009 at 12:25

GoogleCodeExporter commented 8 years ago
Issue 21 has been merged into this issue.

Original comment by snowneb...@gmail.com on 16 Jan 2010 at 2:49

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
hi 
I was running MOSAIK (454 data) on linux with 2GB RAM.
but i was getting 
MosaikAligner 2.1.33                                                2011-11-08
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

Floating point exception

during Mosaik alignment step.

Is it due to low memory or something different ,please provide some solution to 
this problem.

thanking you
amit

Original comment by gupta.am...@gmail.com on 23 Mar 2012 at 7:17