openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

Fix chrM / MT normalization #226

Closed iskandr closed 4 years ago

iskandr commented 4 years ago

In the dark ages of PyEnsembl we needed a quick way to annotate variants from hg19 and GRCh37 using Ensembl reference data. This lead to a dirty chromosome normalization hack where we turn e.g. "chr1" -> "1" and "chrM" -> "MT".

This was always a little questionable (since the mitochondrial sequences aren't actually the same) but even worse is incorrect for references like GRCh38.

So, this PR gets rid of two aspects of contig normaliztion: "chr" prefix is now preserved and we don't convert "M" -> "MT" for the mitochondrial genome.

Fixes: https://github.com/openvax/pyensembl/issues/225

To restore use of Ensembl data for hg19 variants we'll have to make a more local contig conversion option in Varcode.

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-0.08%) to 79.923% when pulling f28cf5e85ead1fcb8150a3935ee5c623aee28743 on fix-MT-normalization into b48d73633d610dde3c500b01adba8c1524fc78ef on master.