lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 557 forks source link

incorrect alignment of long contigs / problem with bandwidth tracking? #89

Open pmarks opened 8 years ago

pmarks commented 8 years ago

I'm aligning long assembled contigs (from Supernova) to hg19. Occasionally bwa mem will generate long runs of bad alignments with many long indels, for sequence that actually matches well. From tracing through some examples, it appears that alignments merged by mem_patch_reg will sometime not be subsequently processed by bwa_gen_cigar2 with a large enough bandwidth to generate a correct alignment. I can work around the issue by raising PATCH_MIN_SC_RATIO of 0.95 (https://github.com/lh3/bwa/blob/master/bwamem.c#L404), but I imagine there's a better solution.

I attached an example sequence which triggers the problem when aligned to hg19, on the latest bwa mem code. You can see the issue at chr2:122,075,076-122,087,607.

Does not show the problem: bwa mem -v 5 -x intractg -w 500 -d 200 /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa 4535_slice.fasta

Shows the problem: bwa mem -v 5 -x intractg -w 600 -d 200 /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa 4535_slice.fasta

4535_slice.fasta.txt

IGV tracks of the alignments resulting from the above command are show here: image

Thanks! Pat Marks

mdkeehan commented 7 years ago

Aligning contigs from multiple supernova denovo assemblies to a reference assembly is a usecase I have as well. Contigs can be megabases in size and I wish to have more global alignment behaviour so we can discover regions of high divergence between the supernova assemblies and the reference assembly. We have large memory machines so could we crank up -w and -d to say 10,000 ?

Some discussion on appropriate settings for bwa mem would be highly appreciated.

lh3 commented 7 years ago

Use minimap2.