iobio / gene.iobio

Gene.iobio vue
MIT License
55 stars 11 forks source link

Transcript problems with GRCh38 #619

Closed AlistairNWard closed 1 year ago

AlistairNWard commented 4 years ago

Here is a link to GRCh38 file in gene COL9A3.

https://gene.iobio.io/?gene=COL9A3&genes=COL9A3&species=Human&build=GRCh38&affectedSibs=&unaffectedSibs=&rel0=proband&vcf0=https%3A%2F%2Fiobio.s3.amazonaws.com%2Fsamples%2Fvcf%2Fplatinum-exome.GRCh38.vcf.gz&tbi0=&bam0=https%3A%2F%2Fiobio.s3.amazonaws.com%2Fsamples%2Fbam%2FNA12878.GRCh38.bam&bai0=&sample0=NA12878&affectedStatus0=affected

  1. When you land all the variants are gray. There is no impact assigned to any variant
  2. Click on the transcript dropdown - lots of unnecessary whitespace and if you scroll down through the transcripts there are gaps which I don't understand

Screen Shot 2020-09-04 at 4 11 28 PM

  1. Click on the far left variant and you get this:

Screen Shot 2020-09-04 at 4 14 12 PM

  1. Click on a different transcript (you can just select one from the variant review card)
    • Focus on the variant is lost, so I have to go reselect it
    • Now you see there is an rsid
AlistairNWard commented 3 years ago

This still seems to be a problem, but maybe not something to worry about before paper release.

AlistairNWard commented 2 years ago

Some of the issues appear to be resolved. What remains is:

  1. Why are there no impacts on the canonical transcript?
  2. Why does the far left variant have no rsid on the canonical trasncript?
  3. Changing transcript loses focus on the selected variant - but the variant info remains
tonydisera commented 2 years ago

The problem is that the canonical transcript for gene COL9A3 is incorrect. VEP is returning the annotations on variants for all transcripts of gene COL9A3, but the so-called canonical transcript 'ENST00000343916.7' is not included in VEP transcripts. This results in strange variant annotations in the Variant inspect card, like the ones mentioned above.

So first, there should be an error reported that the transcript provided is not recognized by VEP. And second, we should re-load the transcripts for this gene to correct the transcript problem. (See issue #164).

Here are the transcripts returned from the VEP annotation on a variant for this gene:

Gencode ENST00000335351 ENST00000462700 ENST00000466192 ENST00000466532 ENST00000467819 ENST00000469802 ENST00000469852 ENST00000472880 ENST00000481800 ENST00000490398 ENST00000649368

RefSeq NM_001853.4 NM_006602.4 XM_005260185.4 XM_011528545.1 XM_017027666.1 XM_024451812.1 XM_024451813.1 XR_002958453.1 XR_002958454.1

Here are the transcripts according to gene.iobio:

Screen Shot 2022-07-20 at 10 18 44 AM Screen Shot 2022-07-20 at 10 18 51 AM
AlistairNWard commented 2 years ago

MANE would be a nice way to help resolve this. The MANE transcript for COL9A3 is ENST00000649368.1 (NM_001853.4). Which is odd because the Ensembl id doesn't match up.

tonydisera commented 2 years ago

This is strange. The so-called canonical transcript for COL9A3 in gene.iobio is ENST00000343916, which is confirmed in gnomAD.

Screen Shot 2022-07-20 at 10 26 28 AM

So was this transcript renamed at some point to ENST00000649368?

AlistairNWard commented 2 years ago

Which is roundly rebuffed by Ensembl!! EST00000343916.3 isn't present at all.

Screen Shot 2022-07-20 at 12 30 23 PM
tonydisera commented 2 years ago

Interesting @AlistairNWard! Thanks for looking into this. I think I'll download the GenCode gff (build GRCh38) and look at the transcripts for COL9A3. Let's hope that this has been fixed.