sequencing / isaac_aligner

Isaac Genome Alignment Software
Other
37 stars 8 forks source link

how to stop isaac _aligner from renaming reads #14

Closed grendon closed 10 years ago

grendon commented 10 years ago

I run isaac_aligner on a WGS synthetic dataset at 50X coverage and 0.01 error rate.

When I tried to then examine the sorted.bam file produced by the aligner and checked where the reads were mapped to, I found out that all the reads have been renamed in the sorted.bam file.

Is there a parameter I need to specify to indicate that I want the reads not to be renamed by the aligner? OR is there a utility/script that I can use to run on the bam file to restore those read names?

Thanks in advance

This is how my raw reads look like

@chrM-11881-12069-6012/1 ACCTACTGGGAGAACTCTCTGTGCTAGTAACCACGTCCTCCTGATCAAATATCACTCTCCTACTTACAGGACTCAACATA CTAGTCACAGCCCTATACTC +

;?AAAA:C5>97=;;7=>7)EA9=;;<>769:>$@?::8=;479=A>>63=>;9:>;>9=;==??=<9;4:>9??? =?????>004>;>;?=>7;= @chrM-11902-12079-6014/1 TGCTAGTAACCACGTTCTCCTGATCAAATATCACTCTCCTACTTACAGGACTCAACATACTAGTCACAGCCCTATACTCC CTCTACATATTTACCACAAC + ;>068AA53=9?A::?=>;>6=;@;<>9;54<:=;<?:?88==C:54>;>9=A:@?=A?AA?A???9:?@;?A?604<< @???>63??@:???7;;>:> @chrM-11944-12169-6016/1 TTACAGGACTCAACATACTAGTCACAGCCCTATACTCCCTCTACATATTTACCACAACACAATGGGGCTCACTCACCCAC CACATTAACAACATAAAACC + ;>AAAAA.??A<=72<>3><.E>-EB>47:;:<>7;783:>:AA=8<;:>108<=9:>;>;>>6;?>666=??A<>=?? ??@5>7;005>903>7=?

These are the top 30 lines of the sorted.bam file produced by isaac_aligner

@HD VN:1.0 SO:coordinate @PG ID:iSAAC PN:iSAAC CL:/ui/mayo/rendong/bin/isaac_aligner/bin/isaac-align -r /scratch/users/rendong/humangenome/isaac_index/sorted-reference.xml -m 90 -b /ui/mayo/rendong/scratch-global/humanReads/WholeGenome_50X_100nt_0.01ErrorRate --base-calls-format fastq -j 20 --variable-read-length yes VN:iSAAC-01.14.04.17 @RG ID:0 PL:ILLUMINA SM:default PU:unknown-flowcell:1:none @SQ SN:chrM LN:16571 UR:/scratch/users/rendong/humangenome/genome.fa M5:d2ed829b8a1628d16cbeee88e88e39eb @SQ SN:chr1 LN:249250621 UR:/scratch/users/rendong/humangenome/genome.fa M5:1b22b98cdeb4a9304cb5d48026a85128 @SQ SN:chr2 LN:243199373 UR:/scratch/users/rendong/humangenome/genome.fa M5:a0d9851da00400dec1098a9255ac712e @SQ SN:chr3 LN:198022430 UR:/scratch/users/rendong/humangenome/genome.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3 @SQ SN:chr4 LN:191154276 UR:/scratch/users/rendong/humangenome/genome.fa M5:23dccd106897542ad87d2765d28a19a1 @SQ SN:chr5 LN:180915260 UR:/scratch/users/rendong/humangenome/genome.fa M5:0740173db9ffd264d728f32784845cd7 @SQ SN:chr6 LN:171115067 UR:/scratch/users/rendong/humangenome/genome.fa M5:1d3a93a248d92a729ee764823acbbc6b @SQ SN:chr7 LN:159138663 UR:/scratch/users/rendong/humangenome/genome.fa M5:618366e953d6aaad97dbe4777c29375e @SQ SN:chr8 LN:146364022 UR:/scratch/users/rendong/humangenome/genome.fa M5:96f514a9929e410c6651697bded59aec @SQ SN:chr9 LN:141213431 UR:/scratch/users/rendong/humangenome/genome.fa M5:3e273117f15e0a400f01055d9f393768 @SQ SN:chr10 LN:135534747 UR:/scratch/users/rendong/humangenome/genome.fa M5:988c28e000e84c26d552359af1ea2e1d @SQ SN:chr11 LN:135006516 UR:/scratch/users/rendong/humangenome/genome.fa M5:98c59049a2df285c76ffb1c6db8f8b96 @SQ SN:chr12 LN:133851895 UR:/scratch/users/rendong/humangenome/genome.fa M5:51851ac0e1a115847ad36449b0015864 @SQ SN:chr13 LN:115169878 UR:/scratch/users/rendong/humangenome/genome.fa M5:283f8d7892baa81b510a015719ca7b0b @SQ SN:chr14 LN:107349540 UR:/scratch/users/rendong/humangenome/genome.fa M5:98f3cae32b2a2e9524bc19813927542e @SQ SN:chr15 LN:102531392 UR:/scratch/users/rendong/humangenome/genome.fa M5:e5645a794a8238215b2cd77acb95a078 @SQ SN:chr16 LN:90354753 UR:/scratch/users/rendong/humangenome/genome.fa M5:fc9b1a7b42b97a864f56b348b06095e6 @SQ SN:chr17 LN:81195210 UR:/scratch/users/rendong/humangenome/genome.fa M5:351f64d4f4f9ddd45b35336ad97aa6de @SQ SN:chr18 LN:78077248 UR:/scratch/users/rendong/humangenome/genome.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c @SQ SN:chr19 LN:59128983 UR:/scratch/users/rendong/humangenome/genome.fa M5:1aacd71f30db8e561810913e0b72636d @SQ SN:chr20 LN:63025520 UR:/scratch/users/rendong/humangenome/genome.fa M5:0dec9660ec1efaaf33281c0d5ea2560f @SQ SN:chr21 LN:48129895 UR:/scratch/users/rendong/humangenome/genome.fa M5:2979a6085bfe28e3ad6f552f361ed74d @SQ SN:chr22 LN:51304566 UR:/scratch/users/rendong/humangenome/genome.fa M5:a718acaa6135fdca8357d5bfe94211dd @SQ SN:chrX LN:155270560 UR:/scratch/users/rendong/humangenome/genome.fa M5:7e0e2e580297b7764e31dbc80c2540dd @SQ SN:chrY LN:59373566 UR:/scratch/users/rendong/humangenome/genome.fa M5:1e86411d73e6f00a10590f976be01623 unknown-flowcell:1:54:1183413:0 99 chrM 1 0 99M = 192 287 GATCACAGGTCTATCACCCTATTAACCACTCACGGGCGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCT >>;?AAAA:C5>97=;;7=>7)EA9=;;<>769:>$@?::8=;479=A>>63=>;9:>;>9=;==??=<9;4:>9???=?????>004>;>;?=>7; RG:Z:0 NM:i:1 BC:Z:none unknown-flowcell:1:54:1183434:0 99 chrM 8 0 99M = 207 297 GGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGAACGCGATAGCATTGCGAGACGCTGGAGCCG >;>:AAAAA?;<A<9=::;<4>:,,/35D=:@AA;8>7;>:>9;7:69;A>668:@;6(EA?;;;:>;><>9?;=>9:>;>:>95>9:>79=>:?;> RG:Z:0 NM:i:1 BC:Z:none

rpetrovski commented 10 years ago

iSAAC-01 does not preserve the fastq read names. The reads in the output bam are numbered in the same order in which they appear in the input fastq. With small data sets it is possible to sort bam reads by name and reapply the names from the corresponding lines of fastq. Unfortunately this approach does not seem realistic for 50x human data.