hammerlab / biokepi

Bioinformatics Ketrew Pipelines
Apache License 2.0
27 stars 4 forks source link

Variant calling parallelization only uses major contigs #160

Open arahuja opened 8 years ago

arahuja commented 8 years ago

Current variant calling/Mutect parallelization only uses major contigs right now. But, with B38, this would drop ALT contigs

Not dropping them makes sense, but in general variant calling likely needs to be rethought for ALT contigs. What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

iskandr commented 8 years ago

What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

A false positive somatic variant.

Maybe for now we should simply avoid calling variants in polymorphic regions?

On Mon, Mar 7, 2016 at 2:56 PM, Arun Ahuja notifications@github.com wrote:

Current variant calling/Mutect parallelization only uses major contigs right now. But, with B38, this would drop ALT contigs

Not dropping them makes sense, but in general variant calling likely needs to be rethought for ALT contigs. What happens when a somatic variant is mapped to an ALT in the tumor sample, but the major contig in the normal sample?

— Reply to this email directly or view it on GitHub https://github.com/hammerlab/biokepi/issues/160.

arahuja commented 8 years ago

Another thing to watch out for is effect on mapq

  1. Does BWA work with ALT contigs in the GRCh38 release? Yes, since 0.7.11, BWA-MEM officially supports mapping to GRCh38+ALT. BWA-backtrack and BWA-SW don't properly support ALT mapping as of now. Please see README-alt.md for details. Briefly, it is recommended to use bwakit, the binary release of BWA, for generating the reference genome and for mapping.
  2. Can I just run BWA-MEM against GRCh38+ALT without post-processing? If you are not interested in hits to ALT contigs, it is okay to run BWA-MEM without post-processing. The alignments produced this way are very close to alignments against GRCh38 without ALT contigs. Nonetheless, applying post-processing helps to reduce false mappings caused by reads from the diverged part of ALT contigs and also enables HLA typing. It is recommended to run the post-processing script.

This page shows some examples: https://github.com/lh3/bwa/blob/master/README-alt.md

If we align sequence reads to GRCh38+ALT blindly, we will get many additional reads with zero mapping quality and miss variants on them.

arahuja commented 8 years ago

Similarly, it seems STAR suggests dropping the ALT contigs: https://github.com/alexdobin/STAR/issues/39#issuecomment-101214342