opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Target profile page rewrite: Create Gene Tree detail views #538

Closed andrewhercules closed 5 years ago

andrewhercules commented 5 years ago

Use Cases

  1. As a user of the Platform, I want to see the history of my target of interest.

  2. As a user of the Platform, I want to see a list of all species and relevant homology types for my target of interest.

Detail Views

Gene tree (default view after user clicks on the Gene Tree summary widget on dashboard) gene_tree_detail_view_gene_tree_visualisation Full-size version

Orthology table view gene_tree_detail_view_orthology_table Full-size version

Design and Interaction Notes

  1. For the Gene Tree view, please display the following text in Open Targets Grey - #5a5f5f:

Phylogenetic tree showing the history of the human gene GENE-SYMBOL based on protein sequences from Ensembl. All branches in this tree have the same length (unscaled branches). You can also view the branches in different lengths based on the number of evolutionary changes in the tree (select scaled branches) All species are shown by default, but you can prune it to a subset of species by unticking the species accordingly. Learn more about protein trees and orthologies.

Please replace GENE-SYMBOL with the HGNC symbol returned by GraphQL.

In the text, please hyperlink protein trees to https://www.ensembl.org/info/genome/compara/homology_method.html and please link orthologies to https://www.ensembl.org/info/genome/compara/homology_types.html.

  1. Please integrate the existing gene tree viewer library used in the current Platform.

  2. For the Orthology Table view, please display the following text in Open Targets Grey - #5a5f5f:

Orthology data for GENE-SYMBOL across a selected set of 12 different species

Please replace GENE-SYMBOL with the HGNC symbol.

In this view, please use the existing data table design pattern and include the filter/search bars in the Species and Homology type columns.

Within the data table, please hyperlink the Ensembl ID in the Orthologue and ID column using the following pattern:

http://www.ensembl.org/' + SPECIES + '/Gene/Summary?g=' + GENE-ID

Please replace the SPECIES and GENE-ID fields with the relevant species and gene IDs from the API response. So for example, the link for the ERR (FBgn0035849) orthologue for ESR1 would be http://www.ensembl.org/drosophila_melanogaster/Gene/Summary?g=FBgn0035849.

Important note about the data returned from Ensembl's /homology API endpoint

Ensembl has provided us with a list of all possible options for homology type in the response object - data.homologies.type - from the /homology endpoint:

ortholog_one2one relevant to our users ortholog_one2many relevant to our users ortholog_many2many between_species_paralog within_species_paralog **relevant to our users other_paralog gene_split alt_allele homoeolog_one2one homoeolog_one2many homoeolog_many2many

In the current Angular web app, we show all data returned by the endpoint. If the term from the API endpoint matches a term in our dictionary, we map it to a more readable term, otherwise we display the row but with N/A in the homology type column.

Based on the discussions and recommendations below, we will hide the responses from the API that do not match the three homology types most relevant to our users. This will keep the data table in sync with the visualisation.

Please implement a similar dictionary that maps the terms found in the response from Ensembl's /homology API endpoint. If the term matches one of the three terms, map to the more readable term and include in the data table. If there is no match, please disregard that data and do not include it in the data table.

var homologyTypeDictionary = {
  'ortholog_one2one': 'orthologue: 1 to 1',
  'ortholog_one2many': 'orthologue: 1 to many',
  'within_species_paralog': 'paralogue'
};

Design Assets

  1. Full-size mockups
  2. Open Targets brand palette colour codes

Spec updated on 9 April 2019

andrewhercules commented 5 years ago

@iandunham, when reviewing the data returned by the Ensembl /homology endpoint and the web app code that generates the orthology data table (e.g. AQP7 orthology data table), I noticed that we only map the following homology terms from the Ensembl API response:

var homologyType = {
  'ortholog_one2one': 'ortholog 1:1',
  'ortholog_one2many': 'ortholog 1:many',
  'within_species_paralog': 'paralog'
};

If the term returned by the API does not appear in that dictionary, we display N/A. However, as mentioned in the design spec, there is a much more extensive list of terms returned by in the Ensembl API response:

ortholog_one2one
ortholog_one2many
ortholog_many2many
within_species_paralog
other_paralog
gene_split
between_species_paralog
alt_allele
homoeolog_one2one
homoeolog_one2many
homoeolog_many2many

Should we continue to only map the three terms currently listed in the codebase? Or should we expand the mapping and include all terms in the dictionary?

deniseOme commented 5 years ago

Worth linking this to another ticket re. use case for paralogs in drug discovery, @iandunham.

https://github.com/opentargets/platform/issues/536#issuecomment-478570177

iandunham commented 5 years ago

At a quick glance I think the mapping is correct. We are anchored on one target, and we want to know it's orthologs in other species which maps to the 'one2many' and 'one2one' calls. We also want to know the paralogs in human which maps to the 'within_species_paralog' call.

The number of paralogs in a specific model species is the 'one2many' call.

Thisis how I understand it so we don't need the other calls. Miguel P was in the compara group that develops this at Ensembl so I am assuming that he got this all correct

iandunham commented 5 years ago

@deniseOme the use case for paralogues in safety is knowing whether there are potentially other targets that might provide the function of the target you are trying to drug. If there is 1 copy in human but 2 in rat safety testing for instance might be misleading if your drug is specific only for the one copy of the two paralogues.

andrewhercules commented 5 years ago

Okay, thanks @iandunham. We will implement the existing dictionary and I'll update the design spec for the summary widget (see #536).

For the data table, any other terms will appear as N/A as is currently the case. Is that okay, or would you prefer not to show those rows?

iandunham commented 5 years ago

They should be hidden shouldn't they? Are they in the current table?

deniseOme commented 5 years ago

Note that we don't have paralogues currently in the table, rather N.A.

ESPNL is a paralogue of ESPN in Ensembl and listed as N.A. in our table.

Screen Shot 2019-04-03 at 16 07 11

Worth knowing the example of 2 copies in the mouse and 1 in human but are we showing paralogues of other species?

Mouse has also a Espnl (paralog of the mouse Espn).

andrewhercules commented 5 years ago

@deniseOme, we do show paralogues in the data table that have a type of within_species_paralog as that is one of the terms included in the dictionary used by the existing web app. See below for an example for AQP7:

Screen Shot 2019-04-03 at 16 08 53

@iandunham, any rows that have types other than what appears in our dictionary (e.g. other_paralog or ortholog_many2many are given a value of N/A). But if they aren't relevant to drug discovery, then I would recommend that we hide them by dropping them from the array used to build the data table.

Screen Shot 2019-04-03 at 16 07 16

iandunham commented 5 years ago

It looks like we shouldn't be showing these, but may need a discussion with Ensembl compara (or Miguel) to understand what the other terms are telling us.

The table and tree should show the same things which it currently doesn't look like they are

deniseOme commented 5 years ago

Ensembl has several paralogues for AQP7 but we only list AL845331.2.

If people working in safety come to the Platform they will have a wrong picture of the paralogues. The will be unaware that AQP7 has also other paralogues such as AQP3, AQP10 (all members of the AQP family). They will only have one paralogue with a strange (clone) name i.e. AL845331.2 (no HGNC name for this locus).

Some of these paralogues are classified as N.A. in our table maybe because of a threshold applied for Target % id and Query % id.

If we were to show paralogy information, it may be better to drop the putative threshold.

Note the ESPN and ESPNL examples above. ESPNL is a paralogue but listed as N.A.

The Open Targets Platform gene tree is only showing orthologues.

iandunham commented 5 years ago

We do show paralogues in human in some trees - look at https://www.targetvalidation.org/target/ENSG00000100197 CYP2D6 and CYP2D7 are human paralogues in the tree, as they should be - there must be some defined level of similarity required - let's check with @empyc Miguel P

deniseOme commented 5 years ago

I see. Note the several paralogues within guinea pig displayed in the same tree as well.

andrewhercules commented 5 years ago

@deniseOme - agreed. We list AL845331.2 because its type matches our dictionary. But for ones like AQP3, AQP8, etc., because they don't match, we show N/A. If we aren't going to map them, perhaps we should ignore them and make it clear that not all paralogues are shown and direct users to the appropriate page on Ensembl?

I don't think we apply any thresholds - we just process the response from the Ensembl API.

For reference, here's the API call we make for AQP7 - from my understanding, we just pass through the taxon codes for the 12 selected species we show:

https://rest.ensembl.org/homology/id/ENSG00000165269.json?format=full;sequence=none;type=all;target_taxon=9606;target_taxon=10090;target_taxon=10141;target_taxon=9544;target_taxon=9615;target_taxon=9986;target_taxon=10116;target_taxon=9823;target_taxon=8364;target_taxon=7955;target_taxon=9598;target_taxon=7227;target_taxon=6239

emepyc commented 5 years ago

@deniseOme The relation between ESPN and ESPNL reported by Ensembl is labelled as "other_paralog" which means that the paralog (ESPNL in this case) is not even in the same gene tree as ESPN, but it is a member of a super-family of proteins. The UI is filtering those out under the interpretation that (since both proteins don't even belong to the same gene tree) they may have diverged enough as to have less probability of sharing the same (or similar) functions. That rationale can work for some cases, but may not work for all. The spirit of that section is to provide easy to understand information about orthology / paralogy (ie, you don't need to be a comparative genomics expert to interpret the information displayed). For reference, this is the gene tree reported by compara for ESPN:

https://rest.ensembl.org/homology/id/ENSG00000144488.json?format=full;sequence=none;type=all;target_taxon=9606;target_taxon=10090;target_taxon=10141;target_taxon=9544;target_taxon=9615;target_taxon=9986;target_taxon=10116;target_taxon=9823;target_taxon=8364;target_taxon=7955;target_taxon=9598;target_taxon=7227;target_taxon=6239

and here the (sparse) documentation about the different types of orthologs and paralogs in compara: https://metazoa.ensembl.org/info/genome/compara/homology_method.html (Under Homology types)

andrewhercules commented 5 years ago

@iandunham and @deniseOme, based on Miguel's comment above, I think we should amend the design spec and have the front-end display responses from the API that map to the existing dictionary:

var homologyType = {
  'ortholog_one2one': 'ortholog 1:1',
  'ortholog_one2many': 'ortholog 1:many',
  'within_species_paralog': 'paralog'
};

This should keep the data table in line with the gene tree visualisation. Beyond keeping the table and visualisation in sync, I also think we should hide these responses because N/A itself is confusing. Does it mean that the row is "not applicable" to the overall data table or target? Or does it mean that the homology type is "not available"? Perhaps it would be best to not show these rows and instead focus on the three homology types that are used in target identification and safety work.

iandunham commented 5 years ago

Yes, but if I understand correctly if we restrict to the 3 responses above there will be no NAs. Is that right? In any case we should only display the 3 responses you have above.

andrewhercules commented 5 years ago

Yes, if we restrict to those three and make a change to the code that generates the table, we can hide all rows with N/A.

I will amend the spec and request that the front-end only show rows where the homology type matches one of the terms in the existing dictionary we currently use.