guanchangge / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

MosaikAligner has segmentation fault for solid pair-end reads #84

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Data: Solid pair-end data
e.g.,
>2_45_92_F3
T20..20..0302.220002..31..3.0..2.2..3.1...23...03..
>2_45_148_F3
T30..12..3030.200112..21..2.3..1.3..2.1...13...23..

and 
>2_45_92_R3
T12.03...320.3.01.11002.3113.31103.0
>2_45_148_R3
T02.03...110.0.30.01001.1003.01013.0

2. I build the reads, the reference and the jump database and then do alignment 
using MosaikAligner. 

What is the expected output? What do you see instead?
I saw the following:

----------------------------------------------------------------------------- 
MosaikAligner 
1.1.0021                                    
          2010-11-10 
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department 
------------------------------------------------------------------------------ 

- Using the following alignment algorithm: all positions 
- Using the following alignment mode: aligning reads to all possible locations 
- Using a maximum mismatch threshold of 10 
- Using a hash size of 15 
- Aligning in colorspace (SOLiD) 
- Using 8 processors 
- Using an alignment candidate threshold of 20bp. 
- Setting hash position threshold to 100 
- Using a jump database for hashing. Storing keys & positions in memory. 
- loading basespace reference sequences... finished. 
- loading reference sequence... finished. 
- loading jump key database into memory... finished. 
- loading jump positions database into memory... finished. 

Aligning read library (999983): 
 9% 
[=========================>                          �
�                                       �
�                                       �
�                                       �
�                                       �
�                                       �
�                    ]   1,507.1 reads/s      ETA 
09:58 -Segmentation fault

What version of the product are you using? On what operating system?
Mosaik 1.1.0021 on Linux (tried both cluster and a 16 core computer) 

Please provide any additional information below.

For different parameter setting, the errors occur at different time, e.g.,

for the following command:

/share/data/program/mosaik-aligner/bin/MosaikAligner -in 
Pla0000325047_1_PE_HS26611_1.dat -out 
Pla0000325047_1_PE_HS26611_1_aligned_34.dat -ia human_all_color.dat  -ibs 
human_all_base.dat -j human_all_base_15  -p 8 -hs 15 -mm 10 -act 20 -mhp 100
the error occurs when 9% of the reads are aligned.

If I change -act 25, the error occurs when 52% of the reads are aligned.

However, if I change -mm 4, the error occurs at the beginning and 0% of the 
reads are aligned.

If I use the following command, no errors will occur. But the results are 
really bad.

/share/data/program/mosaik-aligner/bin/MosaikAligner -in 
Pla0000325047_1_PE_HS26611_1.dat -out Pla0000325047_1_PE_HS26611_1_aligned.dat 
-ia human_all_color.dat  -ibs human_all_mosaik.dat -j human_all_mosaik_15 -p 8 
-hs 15 -mm 4 -act 30 -mhp 100
------------------------------------------------------------------------------
MosaikAligner 1.1.0021                                              2010-11-10
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch threshold of 4
- Using a hash size of 15
- Aligning in colorspace (SOLiD)
- Using 8 processors
- Using an alignment candidate threshold of 30bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- loading basespace reference sequences... finished.
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.

Aligning read library (999983):
100%[===========================================================================
================================================================================
================================================================================
===================================================]     516.8 reads/s       in 
32:14  

Alignment statistics (mates):
===================================
# failed hash:         80 (  0.0 %)
# filtered out:   1996947 ( 99.8 %)
# unique:            1560 (  0.1 %)
# non-unique:        1379 (  0.1 %)
-----------------------------------
total:            1999966
total aligned:       2939 (  0.1 %)

Alignment statistics (reads):
============================================
# unaligned:                997045 ( 99.7 %)
# orphaned:                   2937 (  0.3 %)
# both mates unique:             0 (  0.0 %)
# one mate non-unique:           1 (  0.0 %)
# both mates non-unique:         0 (  0.0 %)
--------------------------------------------
total reads:                999983
total reads aligned:          2938 (  0.3 %)

Original issue reported on code.google.com by Jiarui.D...@gmail.com on 1 Dec 2010 at 6:14

GoogleCodeExporter commented 8 years ago
Hi Jiarui,

Thank you so much for such detailed tests.

I guess the problem is in the colorspace-basespace conversion. MOSAIK converts 
alignments to basespace alignments automatically.

Sorry that I may not fix this bug immediately, but I'll do that soon.

Original comment by WanPing....@gmail.com on 13 Dec 2010 at 2:53