CPG annotation process isn't safe to use

See #269

Firstly, it is safe for the main pipeline to use:

Generate a joint-called dataset
Run VEP on multiallelic data
Apply VEP data to variant data indexed only on Chr:Pos (partial key)
Split multiallelic data out into separate variant rows (Hail's splitting function knows to correctly partition vep content)
...
PROFIT!

The AIP re-annotation cycle starts from the post-split data generated by this pipeline; if two different alt alleles are present at the same locus, these are two different rows. Because multiple rows can have the same locus, the partial-key join between VEP and variant data is broken - this will lead to indexing clashes, and the annotation from one row will be wrongly applied to any other variants at the same site.

This has been seen in re-annotated AIP data (though this feature is rarely used internally), and in the re-annotation of the ClinVar VCF.

The only solution I can think of to solve this is to copy the chunks of cpg_workflows code relating to annotation, and alter how the data is pieced back together. That shouldn't be as bad as it sounds - the current code includes a ton of conditionals and switches for applying VQSR, or running in Dataproc. This version should be relatively minimal.

populationgenomics / automated-interpretation-pipeline

CPG annotation process isn't safe to use #287