lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 557 forks source link

Number of characters per line for a reference sequence #372

Open olu2016 opened 2 years ago

olu2016 commented 2 years ago

Hi,

I have a short reference sequence (specifically a short template - 198 bp) for bwa-mem alignment. The paired-end reads to align to this template are ~150 bp. I carried out an alignment run and the flagstat results are below:

Unfiltered 12763625 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 5931323 + 0 supplementary 0 + 0 duplicates 12721231 + 0 mapped (99.67% : N/A) 6832302 + 0 paired in sequencing 3416151 + 0 read1 3416151 + 0 read2 0 + 0 properly paired (0.00% : N/A) 6780850 + 0 with itself and mate mapped 9058 + 0 singletons (0.13% : N/A) 5513370 + 0 with mate mapped to a different chr 60259 + 0 with mate mapped to a different chr (mapQ>=5)

Filtered for aligned reads 12721231 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 5931323 + 0 supplementary 0 + 0 duplicates 12721231 + 0 mapped (100.00% : N/A) 6789908 + 0 paired in sequencing 3398326 + 0 read1 3391582 + 0 read2 0 + 0 properly paired (0.00% : N/A) 6780850 + 0 with itself and mate mapped 9058 + 0 singletons (0.13% : N/A) 5513370 + 0 with mate mapped to a different chr 60259 + 0 with mate mapped to a different chr (mapQ>=5)

The flagstat outputs above indicate 0% properly paired and a lot of reads have mate mapped to a different chr. So I checked the paired-end reads to be sure they're properly ordered. Having satisfied myself that the paired-end reads are properly ordered, I started to think about number of characters per line (60) in the template file used to create bwa-mem index. My question is: does bwa-mem require a specific number of characters per line to accurately align reads to a reference/template?

Thanks