In the dark ages of PyEnsembl we needed a quick way to annotate variants from hg19 and GRCh37 using Ensembl reference data. This lead to a dirty chromosome normalization hack where we turn e.g. "chr1" -> "1" and "chrM" -> "MT".
This was always a little questionable (since the mitochondrial sequences aren't actually the same) but even worse is incorrect for references like GRCh38.
So, this PR gets rid of two aspects of contig normaliztion: "chr" prefix is now preserved and we don't convert "M" -> "MT" for the mitochondrial genome.
Coverage decreased (-0.08%) to 79.923% when pulling f28cf5e85ead1fcb8150a3935ee5c623aee28743 on fix-MT-normalization into b48d73633d610dde3c500b01adba8c1524fc78ef on master.
In the dark ages of PyEnsembl we needed a quick way to annotate variants from hg19 and GRCh37 using Ensembl reference data. This lead to a dirty chromosome normalization hack where we turn e.g. "chr1" -> "1" and "chrM" -> "MT".
This was always a little questionable (since the mitochondrial sequences aren't actually the same) but even worse is incorrect for references like GRCh38.
So, this PR gets rid of two aspects of contig normaliztion: "chr" prefix is now preserved and we don't convert "M" -> "MT" for the mitochondrial genome.
Fixes: https://github.com/openvax/pyensembl/issues/225
To restore use of Ensembl data for hg19 variants we'll have to make a more local contig conversion option in Varcode.