Closed deniseOme closed 4 years ago
This does not look like an "issue" to me, just a data update. EVA evidence data has changed substantially for the 19.04 release because ClinVar added many new rsIds in January. In this particular case, although RCV000032098 is still associated with HTT in Huntington's disease, it seems that priority is given to the rsID, which is new:
19.02_cttv012-29-01-2019.json
{
"alleleOrigin": "germline",
"phenotype": "http://www.orpha.net/ORDO/Orphanet_399",
"gene": "ENSG00000197386",
"clinvarAccession": "RCV000032098",
"variant_id": "RCV000032098"
}
19.04_cttv012-10-04-2019.json
{
"alleleOrigin": "germline",
"phenotype": "http://www.orpha.net/ORDO/Orphanet_399",
"gene": "ENSG00000197386",
"clinvarAccession": "RCV000032098",
"variant_id": "rs71180116"
}
This is probably something we should discuss with EVA.
I think itis an issue, in that VEP does not accurately call the consequence of rs71180116 because the polarity of the alleles is wrong. The disease associated allele is the trinucleotide expansion allele (erroneously represented as an insertion) rather than the smaller "deletion" allele. The 'normal' is the deleted/non-expanded allele. I think this is why we have the over-ride file that adam points to above which corrects these consequences for certain mutations. As a temporary fix we should include rs71180116 in that file, and also talk to EVA about how mutations are represnted in Clinvar - we may not be able to do anything about that
Out of the 23 trinucleotide expansions in the ot_trinucleotide.txt file, only 12 appear in the 19.04 EVA evidence file and three of them have the wrong functional consequence annotated (see rows in bold):
ClinVar Accession | Variant id | Gene id | Phenotype | Functional Consequence |
---|---|---|---|---|
RCV000011272 | rs193922937 | ENSG00000155966 | FRAXE intellectual disability | trinucleotide_repeat_expansion |
RCV000003729 | rs193922928 | ENSG00000066427 | Spinocerebellar ataxia type 3 | trinucleotide_repeat_expansion |
RCV000004184 | RCV000004184 | ENSG00000165060 | Friedreich ataxia | trinucleotide_repeat_expansion |
RCV000009015 | RCV000009015 | ENSG00000141837 | Spinocerebellar ataxia type 6 | trinucleotide_repeat_expansion |
RCV000019124 | RCV000019124 | ENSG00000169714 | Proximal myotonic myopathy | trinucleotide_repeat_expansion |
RCV000191939 | rs1555397179 | ENSG00000066427 | Spinocerebellar ataxia type 3 | trinucleotide_repeat_expansion |
RCV000483474 | rs3032358 | ENSG00000169083 | Kennedy disease | inframe_deletion |
RCV000005352 | RCV000005352 | ENSG00000104936 | Steinert myotonic dystrophy | trinucleotide_repeat_expansion |
RCV000008537 | rs193922926 | ENSG00000124788 | Spinocerebellar ataxia type 1 | trinucleotide_repeat_expansion |
RCV000032098 | rs71180116 | ENSG00000197386 | Huntington disease | inframe_deletion |
RCV000032099 | rs71180116 | ENSG00000197386 | Huntington disease | inframe_deletion |
RCV000162201 | rs193922936 | ENSG00000102081 | Fragile X syndrome | trinucleotide_repeat_expansion |
I'm in touch with Kirill in EVA team to get this sorted for the next release.
After talking to EVA, it has been found that the issue is caused by the approach used to pair ClinVar records with the mapping provided:
RCV000032098 and RCV000032099 are masked by rs71180116 and the following entry in the mapping file:
rs71180116 1 ENSG00000197386 HTT inframe_deletion 0
Similarly, the entry masking RCV000483474 is this:
X:67545318-67545320:1/- 1 ENSG00000169083 AR inframe_deletion 0
Those two rows have been manually removed from the mapping file returned to EVA and now the functional consequence of those variants is trinucleotide_repeat_expansion
as expected. However, they still have an associated rsID, so the link may still point to an Ensembl website with the wrong function annotation. This will be furter discussed with EVA.
Manual correction has been repeated for the 19.09 release.
Manual correction has been repeated for the 19.11 release.
In addition, two of the trinucleotide expansions (RCV000005966, RCV000006518) have been missing in the EVA evidence files at least since 19.11, I'll ask them about it.
Fixed: back to trinucleotide expansion.
I'm afraid this isn't fixed @deniseOme. The functional consequence in the data and UI is correct because I manually fix the output of the snp-to-gene mapping. EVA are working on a permanet solution for it.
Manual correction has been repeated for the 20.02 release.
Kirill has been working on a new way to process trinucleotide variants and on re-implementing the variant-to-gene mapping but I need to ask them the status of this work.
This is now handled by EVA
In the 20.04 release two trinucleotide repeat expansions (RCV000005352 and RCV000162201) were annotated with different functional consequences. The following was Kirill's (EVA) explanation:
The change you are observing is caused by migrating to the new VEP pipeline. Under the new approach, the pairing of variant records is first attempted using full variant description (CHROM:POS:REF:ALT), and in case this fails, using an RCV ID (intended for manual trinucleotide repeat mappings only). While for almost all trinucleotide repeats ClinVar does not provide an explicit variant description, for those two variants which you mentioned this is not the case, and there are full descriptions. For example, for RCV000005352, it is provided as 19:45770204:C:CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG. When you run this through VEP, it returns a consequence of splice_region_variant. This is why it is in the evidence strings.
These two have been left as they are for 20.04. They will be checked again for 20.06 and they can be corrected manually if needed.
Fixed in 20.06
Genetic evidence linking HTT to huntington disease has changed from trinucleotide expansion to inframe deletion.
From Entrez Gene: "Huntington's disease, a neurodegenerative disorder characterized by loss of striatal neurons. This is thought to be caused by an expanded, unstable trinucleotide repeat in the huntingtin gene, which translates as a polyglutamine repeat in the protein product. A fairly broad range of trinucleotide repeats (9-35) has been identified in normal controls, and repeat numbers in excess of 40 have been described as pathological"
This data comes from ClinVar via EVA. Has anything changed in the JSON from EVA between 19.02 and 19.04 for HTT and Huntington?
Some notes:
some past conversations: https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/399#issuecomment-302463868
I'd have thought that inframe deletion is actually wrong, it could be inframe insertion but not deletion
we list the variant as rs71180116 (rather than RCV000032098) in the second (mutation) column in the genetic associations page. Using the rs ID instead of RCV ID, we now point to Ensembl (instead of pointing to ClinVar ), which calls this "inframe deletion". Is that why we've changed from trinucleotide expansion to inframe deletion? Can we override the VEP results? VEP consequence term for rs193922926 is overridden for this example:
https://www.targetvalidation.org/evidence/ENSG00000124788/Orphanet_98755?view=sec:genetic_association
(trinucleotide repeat expansion in the Platform, coding sequence variant in Ensembl)