Closed VBHerrenC closed 3 months ago
Hi @VBHerrenC,
My first guess would be that the aligner is generating a lot of secondary alignments. You could try filtering these out and see if the read length distribution is closer to what you expect? When you do the conversion to fastq, add the filter flag:
samtools bam2fq -F 0x900 <file>.bam > <file>.fastq
Hi @malton-ont,
Thanks for the response! This did clear up the issue. It looks like the reference we were using was indexed to where we would expect the middle of the reads to be and this caused the issue. Just for future knowledge, does dorado aligner "create" new reads when they are secondary alignments? I always thought it just added a flag to existing reads but I must be wrong. Thanks for the help!
Hi @VBHerrenC,
The output from dorado aligner
is consistent with output from minimap2
- secondary and supplementary reads are stored as entirely separate entries in the bam files, with their own alignment information and with the flags value set to indicate the type of alignment.
Hi @malton-ont,
Understood, thank you. I'll mark this as closed. Appreciate the clarification!
Issue Report
Please describe the issue:
When examining the read length distribution before and after using dorado aligner, the read length distribution changes dramatically, adding a major peak around 2,000 nt.
Steps to reproduce the issue:
Each dorado command was run. For the second dorado command, we converted the bam to a fastq using
samtools bam2fq dorado_fast_qFilterLow_noTrim.bam > dorado_fast_qFilterLow_noTrim_convert.fastq
. The converted FASTQ was run through the same read length analysis script and produced the exact same graph as the first command. We then used dorado aligner to align the bam and ran it through a similar read length analysis script to produce the second graph with the additional peak.Run environment:
dorado aligner -o dorado_aligner_testing refFasta.fasta dorado_fast_qFilterLow_noTrim.bam
Logs
dorado aligner -o dorado_aligner_testing refFasta.fasta dorado_fast_qFilterLow_noTrim.bam [2024-06-20 10:40:10.296] [info] Running: "aligner" "-o" "dorado_aligner_testing" "refFasta.fasta" "dorado_fast_qFilterLow_noTrim.bam" [2024-06-20 10:40:10.296] [info] num input files: 1 [2024-06-20 10:40:10.296] [info] > loading index refFasta.fasta [2024-06-20 10:40:10.303] [info] processing dorado_fast_qFilterLow_noTrim.bam -> dorado_aligner_testing/dorado_fast_qFilterLow_noTrim.bam [2024-06-20 10:40:10.976] [info] > starting alignment [2024-06-20 10:40:43.611] [info] > finished alignment [2024-06-20 10:40:43.611] [info] > merging temporary BAM files [2024-06-20 10:41:05.271] [info] > Simplex reads basecalled: 324220 [2024-06-20 10:41:05.271] [info] > total/primary/unmapped 511732/324513/1269