opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

trinucleotide expansion up to 19.02 --> inframe deletion in 19.04 #583

Closed deniseOme closed 4 years ago

deniseOme commented 5 years ago

Genetic evidence linking HTT to huntington disease has changed from trinucleotide expansion to inframe deletion.

From Entrez Gene: "Huntington's disease, a neurodegenerative disorder characterized by loss of striatal neurons. This is thought to be caused by an expanded, unstable trinucleotide repeat in the huntingtin gene, which translates as a polyglutamine repeat in the protein product. A fairly broad range of trinucleotide repeats (9-35) has been identified in normal controls, and repeat numbers in excess of 40 have been described as pathological"

This data comes from ClinVar via EVA. Has anything changed in the JSON from EVA between 19.02 and 19.04 for HTT and Huntington?

Some notes:

https://www.targetvalidation.org/evidence/ENSG00000124788/Orphanet_98755?view=sec:genetic_association

(trinucleotide repeat expansion in the Platform, coding sequence variant in Ensembl)

afaulconbridge commented 5 years ago

Is this relevant: https://github.com/opentargets/data_release/wiki/OT007-SNP2GENE-Pipeline#adding-trinucleotide-disease-annotation-eva-only ?

AsierGonzalez commented 5 years ago

This does not look like an "issue" to me, just a data update. EVA evidence data has changed substantially for the 19.04 release because ClinVar added many new rsIds in January. In this particular case, although RCV000032098 is still associated with HTT in Huntington's disease, it seems that priority is given to the rsID, which is new:

19.02_cttv012-29-01-2019.json

{
  "alleleOrigin": "germline",
  "phenotype": "http://www.orpha.net/ORDO/Orphanet_399",
  "gene": "ENSG00000197386",
  "clinvarAccession": "RCV000032098",
  "variant_id": "RCV000032098"
}

19.04_cttv012-10-04-2019.json

{
  "alleleOrigin": "germline",  
  "phenotype": "http://www.orpha.net/ORDO/Orphanet_399",
  "gene": "ENSG00000197386",
  "clinvarAccession": "RCV000032098",
  "variant_id": "rs71180116"
}

This is probably something we should discuss with EVA.

iandunham commented 5 years ago

I think itis an issue, in that VEP does not accurately call the consequence of rs71180116 because the polarity of the alleles is wrong. The disease associated allele is the trinucleotide expansion allele (erroneously represented as an insertion) rather than the smaller "deletion" allele. The 'normal' is the deleted/non-expanded allele. I think this is why we have the over-ride file that adam points to above which corrects these consequences for certain mutations. As a temporary fix we should include rs71180116 in that file, and also talk to EVA about how mutations are represnted in Clinvar - we may not be able to do anything about that

AsierGonzalez commented 5 years ago

Out of the 23 trinucleotide expansions in the ot_trinucleotide.txt file, only 12 appear in the 19.04 EVA evidence file and three of them have the wrong functional consequence annotated (see rows in bold):

ClinVar Accession Variant id Gene id Phenotype Functional Consequence
RCV000011272 rs193922937 ENSG00000155966 FRAXE intellectual disability trinucleotide_repeat_expansion
RCV000003729 rs193922928 ENSG00000066427 Spinocerebellar ataxia type 3 trinucleotide_repeat_expansion
RCV000004184 RCV000004184 ENSG00000165060 Friedreich ataxia trinucleotide_repeat_expansion
RCV000009015 RCV000009015 ENSG00000141837 Spinocerebellar ataxia type 6 trinucleotide_repeat_expansion
RCV000019124 RCV000019124 ENSG00000169714 Proximal myotonic myopathy trinucleotide_repeat_expansion
RCV000191939 rs1555397179 ENSG00000066427 Spinocerebellar ataxia type 3 trinucleotide_repeat_expansion
RCV000483474 rs3032358 ENSG00000169083 Kennedy disease inframe_deletion
RCV000005352 RCV000005352 ENSG00000104936 Steinert myotonic dystrophy trinucleotide_repeat_expansion
RCV000008537 rs193922926 ENSG00000124788 Spinocerebellar ataxia type 1 trinucleotide_repeat_expansion
RCV000032098 rs71180116 ENSG00000197386 Huntington disease inframe_deletion
RCV000032099 rs71180116 ENSG00000197386 Huntington disease inframe_deletion
RCV000162201 rs193922936 ENSG00000102081 Fragile X syndrome trinucleotide_repeat_expansion

I'm in touch with Kirill in EVA team to get this sorted for the next release.

AsierGonzalez commented 5 years ago

After talking to EVA, it has been found that the issue is caused by the approach used to pair ClinVar records with the mapping provided:

  1. Try pairing using rsID
  2. Or, if not present in the mappings file, use nsvID
  3. Or, if not present, use full variant description (e. g. 10:100749815-100749815:1/A)
  4. Finally, if none of the above worked, use ClinVar RCV ID

RCV000032098 and RCV000032099 are masked by rs71180116 and the following entry in the mapping file:

rs71180116  1   ENSG00000197386 HTT inframe_deletion    0

Similarly, the entry masking RCV000483474 is this:

X:67545318-67545320:1/-    1    ENSG00000169083    AR    inframe_deletion    0

Those two rows have been manually removed from the mapping file returned to EVA and now the functional consequence of those variants is trinucleotide_repeat_expansion as expected. However, they still have an associated rsID, so the link may still point to an Ensembl website with the wrong function annotation. This will be furter discussed with EVA.

AsierGonzalez commented 4 years ago

Manual correction has been repeated for the 19.09 release.

AsierGonzalez commented 4 years ago

Manual correction has been repeated for the 19.11 release.

In addition, two of the trinucleotide expansions (RCV000005966, RCV000006518) have been missing in the EVA evidence files at least since 19.11, I'll ask them about it.

deniseOme commented 4 years ago

Fixed: back to trinucleotide expansion. Screen Shot 2019-10-28 at 17 09 28

AsierGonzalez commented 4 years ago

I'm afraid this isn't fixed @deniseOme. The functional consequence in the data and UI is correct because I manually fix the output of the snp-to-gene mapping. EVA are working on a permanet solution for it.

AsierGonzalez commented 4 years ago

Manual correction has been repeated for the 20.02 release.

Kirill has been working on a new way to process trinucleotide variants and on re-implementing the variant-to-gene mapping but I need to ask them the status of this work.

AsierGonzalez commented 4 years ago

This is now handled by EVA

AsierGonzalez commented 4 years ago

In the 20.04 release two trinucleotide repeat expansions (RCV000005352 and RCV000162201) were annotated with different functional consequences. The following was Kirill's (EVA) explanation:

The change you are observing is caused by migrating to the new VEP pipeline. Under the new approach, the pairing of variant records is first attempted using full variant description (CHROM:POS:REF:ALT), and in case this fails, using an RCV ID (intended for manual trinucleotide repeat mappings only). While for almost all trinucleotide repeats ClinVar does not provide an explicit variant description, for those two variants which you mentioned this is not the case, and there are full descriptions. For example, for RCV000005352, it is provided as 19:45770204:C:CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG. When you run this through VEP, it returns a consequence of splice_region_variant. This is why it is in the evidence strings.

These two have been left as they are for 20.04. They will be checked again for 20.06 and they can be corrected manually if needed.

AsierGonzalez commented 4 years ago

Fixed in 20.06