googlegenomics / gcp-variant-transforms

GCP Variant Transforms
Apache License 2.0
134 stars 55 forks source link

Add an option to normalize variants #93

Open arostamianfar opened 6 years ago

arostamianfar commented 6 years ago

There are multiple ways of representing the same variant in a VCF file and some formats provide redundant info. We can provide common normalization transforms (e.g. left/right trimming, removing unnecessary bases from indels, etc) as part of the pipeline. This will be essential for properly merging variants across files as well.

Related: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481842/

arostamianfar commented 6 years ago

Can also consider a more simpler approach that doesn't require the reference: http://www.cureffi.org/2014/04/24/converting-genetic-variants-to-their-minimal-representation/ , but does require splitting alternates.

arostamianfar commented 5 years ago

Lowering the priority as doing this properly requires integrating reference data into Variant Transforms and is not planned for Q3.