guanchangge / mosaik-aligner

Automatically exported from code.google.com/p/mosaik-aligner
0 stars 0 forks source link

MosaikSort - reference index is out the range when saving alignments. #90

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. MosaikAligner -ia ref.dat -in reads.dat -out map.dat -hs 15 -mm 2 -act 25 -m 
all -p 4 -bw 17
2. MosaikDupSnoop -in map.dat -od DupPE
3. MosaikSort -in map.dat -out map.sort -dup DupPE/

- phase 2 of 3: resolve read pairs:
100%[==============================] 139,171.9 reads/s       in 01:20  
- phase 3 of 3: sort resolved read pairs:
ERROR: The reference index is out the range when saving alignments.

What is the expected output? What do you see instead?
I can run the MosaikSort on the small data set (5000 reads) but it always 
failed on the whole lane of the Illumina paired-end data.

What version of the product are you using? On what operating system?
MosaikSort 1.1.0021
Linux 2.6.18-164.15.1.el5.028stab068.9, on the local hard disk

Please provide any additional information below.

Original issue reported on code.google.com by yalin...@gmail.com on 20 Jan 2011 at 2:45

GoogleCodeExporter commented 8 years ago
Sorry, it's my mistake. 
Although the map.dat file was on the local hard disk but the MOSAIK_TMP 
directory still pointed to the NAS.

Cheers,
Yao-Cheng

Original comment by yalin...@gmail.com on 21 Jan 2011 at 1:28

GoogleCodeExporter commented 8 years ago
Hi Yao-Cheng,

I was wondering the purpose you're using MosaikDupSnoop, since we may no longer 
maintain it. In our next release, we'll let MosaikAligner outputting BAMs 
directly, and only keep MosaikBuild, MosaikAligner, and MosaikJump.

Best,
Wan-Ping

Original comment by WanPing....@gmail.com on 21 Jan 2011 at 9:28

GoogleCodeExporter commented 8 years ago
Dear Wan-Ping,

I am trying to remove the duplicate reads from our poorly prepared mate-pair 
library. I used to use the cdhit-454 or FASTX package to remove duplicate reads 
then feed to MosaikAligner for more than a year. I just realized that 
MosaikDupSnoop can do the trick and the idea of checking alignment position 
(especially for paired-end) is much clever than blindly collapse identical 
reads on either sides. However, the disk I/O is quite a problem especially most 
of cluster nodes are attached to NAS or even virtualized.

A bit off topic, I am wandering how do people remove duplicate reads now (if 
one takes the paired-end distance into account)?

Best regards,
Yao-Cheng

ps. Hope it's not cross posting.

Original comment by yalin...@gmail.com on 22 Jan 2011 at 8:38