Create genetic associations detail view

d0choa commented 5 years ago

It will contain:

Table
Browser

d0choa commented 5 years ago

Waiting for confirmation from @andrewhercules and @MichaelaEBI on what the table should look like

andrewhercules commented 5 years ago

There was a preliminary discussion around combining the Common diseases and Rare diseases data tables. However, I think that given a recent Slack conversation, the two tables should remain distinct - see below:

Screenshot 2019-07-26 at 10 56 35

I would also recommend that we leave the Common diseases data table as it currently is because there are few cases of n/a or unknown.

However, for the Rare diseases data table, I propose that we create two different tables depending on the source of the data:

Table 1: EVA, UniProt Table 2: Gene2Phenotype, UniProt literature, Genomics England

Based on a cursory glance of the evidence, we do not get the mutation, mutation consequence, or clinical significance values for an evidence string from Gene2Phenotype, UniProt literature, or Genomics England. This leads to tables full of N/A and Curated evidence values.

For example, https://www.targetvalidation.org/evidence/ENSG00000213614/Orphanet_845?view=sec:genetic_association and https://www.targetvalidation.org/evidence/ENSG00000134460/EFO_0000540?view=sec:genetic_association.

@iandunham and @d0choa, would this make sense from a scientific perspective? Or would it preferable that the data remain in a single table?

Ian, in a recent FE meeting, you had mentioned that part of the reason behind N/A is that we may get that information from a data provider. Would this be the case for the three data providers that I have identified as suitable for a second, curated evidence Rare diseases data table (Gene2Phenotype, UniProt literature, Genomics England)?

deniseOme commented 5 years ago

@andrewhercules @d0choa @iandunham, I'd vote for the data to remain in a single table and the reasons are several fold (in no particular order)

We are not capturing rsID from Gene2Phenotype (G2P) or Genomic England PanelApp, hence N/A in the table

These two data sources are curated by clinicians who don't tend to work with rsIDs. But we could map the HGVS notation to rsID and display the rsID in the table and the functional consequence, dropping the "curated evidence" placeholder.

From one of links included by @andrewhercules, we can see that G2P provides one evidence for HEXA in Tay-Sachs disease.

From the G2P link, we can go to Decipher and get more info on the G2P variants:

The two variants above do not have a dbSNP notation e.g. rs123, rather a HGVS notation, which is popular among clinicians and diagnostic labs. So we have p.Arg137Ter, which translates into:stop codon Ter at amino acid position 137, where the reference codon is Arg. The variant (or mutation) truncates the protein and this is likely pathogenic.

Is there an rsID for p.Arg137Ter? Yes, there is:

http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?db=core;g=ENSG00000213614;r=15:72635775-72668817;t=ENST00000268097;v=rs121907962;vdb=variation;vf=445749705

This means this variant (mutation in this case) should have an rsID (rs121907962) and a functional consequence (stop gain). Decipher gives a clinical significance as well, which could be pulled in by the Platform (clinical significance is available from ClinVar but G2P and Decipher as well, probably GEL too).

If the HGVS notation is provided in the G2P JSON file, we can use Ensembl VEP REST API endpoint to find the rsID and the functional consequence. Note G2P coordinates are for GRCh37. This'd mean that we will no longer have N/A and "curated evidence" in the table, but rs121907962 and stop gain instead.

rs123 does not mean the evidence is not curated rs121907981 is an evidence for HEXA in Tay-Sachs disease that is curated by UniProt. The VEP says this variant is a missense variant, but it is still curated evidence.

I see the string 'Curated evidence' in our tables as a placeholder for when we don't have information on the functional consequence of the variant/mutation.

Data Download If we were to split the rare diseases table into two, this would mean that users interested in that data would have to download two rather than one table.
Consistency If we split rare diseases table in two, will we split Common disease table as well? How about other tables in the evidence page e.g. Somatic mutations? These could also have with different data sources and N/A in some rows e.g.

https://www.targetvalidation.org/evidence/ENSG00000139618/EFO_0000305?view=sec:somatic_mutation

UniProt versus UniProt literature These two subtypes are available as filtering options in the associations page but not captured in the evidence page: we can filter for UniProt in the Evidence source column but we can't tell if it's UniProt literature.

Scrolling Some users do miss the second table already due to the need to scroll further down. Splitting one table in two will require further scrolling. This can increase the chances of users missing relevant information.

iandunham commented 5 years ago

I think that the reason that gene2phenotype exists is that the variant information cannot be viewed or download outside of the terms of access of Decipher. We can't display the variants in the platform, which would include reverse engineering by scraping data from DeCipher. What we could do if we don't already is to have a link back to the page.

iandunham commented 5 years ago

'Curated evidence' means that the evidence we are displaying is a summary evidence generated by a curator which may have background supporting variants, but is aggregated over several variants. In the context of the functional consequence column of the table it means that the result is an aggregate over possibly multiple variants so there isn't a functional consequence at the variant level. We could try to get a summary consequence like we do for cancer gene census, but the data access is complex here.

So overall curated evidence means that there is either a person looking at the data or there is a pipeline aggregating the data to give a summary. In a sense this is higher quality than just observing a variant.

For Genomics England panels they have a crowd sourced summary of which genes should be looked at for a particular disease, without itemising variants, so agian it's a curated summary of mny views across clinicans

opentargets / issues

Create genetic associations detail view #698