Open arostamianfar opened 6 years ago
Can also consider a more simpler approach that doesn't require the reference: http://www.cureffi.org/2014/04/24/converting-genetic-variants-to-their-minimal-representation/ , but does require splitting alternates.
Lowering the priority as doing this properly requires integrating reference data into Variant Transforms and is not planned for Q3.
There are multiple ways of representing the same variant in a VCF file and some formats provide redundant info. We can provide common normalization transforms (e.g. left/right trimming, removing unnecessary bases from indels, etc) as part of the pipeline. This will be essential for properly merging variants across files as well.
Related: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481842/