opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Create genetic associations detail view #698

Closed d0choa closed 4 years ago

d0choa commented 5 years ago

It will contain:

d0choa commented 5 years ago

Waiting for confirmation from @andrewhercules and @MichaelaEBI on what the table should look like

andrewhercules commented 5 years ago

There was a preliminary discussion around combining the Common diseases and Rare diseases data tables. However, I think that given a recent Slack conversation, the two tables should remain distinct - see below:

Screenshot 2019-07-26 at 10 56 35

I would also recommend that we leave the Common diseases data table as it currently is because there are few cases of n/a or unknown.

However, for the Rare diseases data table, I propose that we create two different tables depending on the source of the data:

Table 1: EVA, UniProt Table 2: Gene2Phenotype, UniProt literature, Genomics England

Based on a cursory glance of the evidence, we do not get the mutation, mutation consequence, or clinical significance values for an evidence string from Gene2Phenotype, UniProt literature, or Genomics England. This leads to tables full of N/A and Curated evidence values.

For example, https://www.targetvalidation.org/evidence/ENSG00000213614/Orphanet_845?view=sec:genetic_association and https://www.targetvalidation.org/evidence/ENSG00000134460/EFO_0000540?view=sec:genetic_association.

@iandunham and @d0choa, would this make sense from a scientific perspective? Or would it preferable that the data remain in a single table?

Ian, in a recent FE meeting, you had mentioned that part of the reason behind N/A is that we may get that information from a data provider. Would this be the case for the three data providers that I have identified as suitable for a second, curated evidence Rare diseases data table (Gene2Phenotype, UniProt literature, Genomics England)?

deniseOme commented 5 years ago

@andrewhercules @d0choa @iandunham, I'd vote for the data to remain in a single table and the reasons are several fold (in no particular order)

These two data sources are curated by clinicians who don't tend to work with rsIDs. But we could map the HGVS notation to rsID and display the rsID in the table and the functional consequence, dropping the "curated evidence" placeholder.

From one of links included by @andrewhercules, we can see that G2P provides one evidence for HEXA in Tay-Sachs disease.

From the G2P link, we can go to Decipher and get more info on the G2P variants:

Screen Shot 2019-07-26 at 17 31 37

The two variants above do not have a dbSNP notation e.g. rs123, rather a HGVS notation, which is popular among clinicians and diagnostic labs. So we have p.Arg137Ter, which translates into:stop codon Ter at amino acid position 137, where the reference codon is Arg. The variant (or mutation) truncates the protein and this is likely pathogenic.

Is there an rsID for p.Arg137Ter? Yes, there is:

http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?db=core;g=ENSG00000213614;r=15:72635775-72668817;t=ENST00000268097;v=rs121907962;vdb=variation;vf=445749705

This means this variant (mutation in this case) should have an rsID (rs121907962) and a functional consequence (stop gain). Decipher gives a clinical significance as well, which could be pulled in by the Platform (clinical significance is available from ClinVar but G2P and Decipher as well, probably GEL too).

If the HGVS notation is provided in the G2P JSON file, we can use Ensembl VEP REST API endpoint to find the rsID and the functional consequence. Note G2P coordinates are for GRCh37. This'd mean that we will no longer have N/A and "curated evidence" in the table, but rs121907962 and stop gain instead.

I see the string 'Curated evidence' in our tables as a placeholder for when we don't have information on the functional consequence of the variant/mutation.

https://www.targetvalidation.org/evidence/ENSG00000139618/EFO_0000305?view=sec:somatic_mutation

Screen Shot 2019-07-26 at 16 07 25
iandunham commented 5 years ago

I think that the reason that gene2phenotype exists is that the variant information cannot be viewed or download outside of the terms of access of Decipher. We can't display the variants in the platform, which would include reverse engineering by scraping data from DeCipher. What we could do if we don't already is to have a link back to the page.

iandunham commented 5 years ago

'Curated evidence' means that the evidence we are displaying is a summary evidence generated by a curator which may have background supporting variants, but is aggregated over several variants. In the context of the functional consequence column of the table it means that the result is an aggregate over possibly multiple variants so there isn't a functional consequence at the variant level. We could try to get a summary consequence like we do for cancer gene census, but the data access is complex here.

So overall curated evidence means that there is either a person looking at the data or there is a pipeline aggregating the data to give a summary. In a sense this is higher quality than just observing a variant.

For Genomics England panels they have a crowd sourced summary of which genes should be looked at for a particular disease, without itemising variants, so agian it's a curated summary of mny views across clinicans