Open pmarks opened 8 years ago
Aligning contigs from multiple supernova denovo assemblies to a reference assembly is a usecase I have as well. Contigs can be megabases in size and I wish to have more global alignment behaviour so we can discover regions of high divergence between the supernova assemblies and the reference assembly. We have large memory machines so could we crank up -w and -d to say 10,000 ?
Some discussion on appropriate settings for bwa mem would be highly appreciated.
I'm aligning long assembled contigs (from Supernova) to hg19. Occasionally bwa mem will generate long runs of bad alignments with many long indels, for sequence that actually matches well. From tracing through some examples, it appears that alignments merged by mem_patch_reg will sometime not be subsequently processed by bwa_gen_cigar2 with a large enough bandwidth to generate a correct alignment. I can work around the issue by raising PATCH_MIN_SC_RATIO of 0.95 (https://github.com/lh3/bwa/blob/master/bwamem.c#L404), but I imagine there's a better solution.
I attached an example sequence which triggers the problem when aligned to hg19, on the latest bwa mem code. You can see the issue at chr2:122,075,076-122,087,607.
Does not show the problem: bwa mem -v 5 -x intractg -w 500 -d 200 /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa 4535_slice.fasta
Shows the problem: bwa mem -v 5 -x intractg -w 600 -d 200 /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa 4535_slice.fasta
4535_slice.fasta.txt
IGV tracks of the alignments resulting from the above command are show here:
Thanks! Pat Marks