varfish-org / mehari

VEP-like tool for sequence ontology and HGVS annotation of VCF files
MIT License
16 stars 1 forks source link

Missing mitochondrial transcripts #381

Closed holtgrewe closed 8 months ago

holtgrewe commented 8 months ago

Describe the bug RefSeq does not have transcripts for mitochondrial genes. Consequently, they are missing from CDOT.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://reev.cubi.bihealth.org/internal/proxy/mehari/genes/txs?hgncId=HGNC:7461&genomeBuild=GENOME_BUILD_GRCH37&pageSize=1000

Expected behavior We need chrMT transcripts in Mehari.

Additional context

holtgrewe commented 8 months ago

Overall, we should either parse the RCRS entry from NucCore or use the ENSEMBL transcripts. Probably the latter is better as we otherwise won't have transcripts but would have to use the gene name for transcripts.

holtgrewe commented 8 months ago

RNA genes from chrMT are not properly in CDOT https://github.com/SACGF/cdot/issues/72

holtgrewe commented 8 months ago

The following chrMT transcripts have a CDS that is not a multiple of 3.

ENST00000361789.2
ENST00000361453.3
ENST00000361227.2
ENST00000361381.2
ENST00000361390.2
ENST00000362079.2

Poly-A is appended to the transcripts which we can emulate by padding the transcripts accordingly and adjusting the CDS on the fly.

cf. https://pubmed.ncbi.nlm.nih.gov/10076021/