Target profile page rewrite: Create Gene Tree summary widget

andrewhercules commented 5 years ago

Use Cases

As a user of the Platform, I would like to see the types of orthologues and paralogues that have been identified for my target of interest.

Summary Views

gene_tree_summary_widget

Full-size version

Design and Interaction Notes

For targets where there is data about orthologues and paralogues (e.g. ESR1, SERPINB2, or AQP7), please use the following text in Open Targets Grey - #5a5f5f:

Orthologues and paralogues for GENE-SYMBOL that have been identified across a selection of 12 different species

Please replace GENE-SYMBOL with the HGNC gene symbol value.

Also, please colour the widget container box outline in Open Targets Grey - #5a5f5f.

Underneath the text, please create a table that copies the design pattern used in the Protein Information summary widget (see #254).

The options in the table are:

1 to 1 orthologues 1 to many orthologues human paralogues

If there is data about a specific type of orthologue or paralogue available, please colour the box containing the FontAwesome checkmark icon in Open Targets Purple - #7b196a and please colour the checkmark icon white.

If data about a specific type of orthologue or paralogue is not available, please replace the checkmark with the the FontAwesome times icon in Open Targets Light Grey - #e2dfdf. Also, please colour the text for that orthologue or paralogue type in Open Targets Light Grey - #e2dfdf.

For targets where this no data about orthologues or paralogues (e.g. HOTAIR or AL138921.2), please display all text (including the FontAwesome times icon) and the widget container box outline in Open Targets Light Grey - #e2dfdf.
When a user hovers over the summary widget and the target has either a small molecule and/or an antibody tractability assessment, please show the pointer icon and change the box outline to Open Targets Purple - #7b196a. This will provide users with a visual cue that the summary widget is clickable. For more information, please see issue #429.

Design Assets

ticket updated on 4 April 2019

deniseOme commented 5 years ago

One of the use cases for providing the orthologues to human targets is the possibility of using this information in pre-clinical stages of drug discovery (safety/toxicity experiments in mouse, rat, dogs, Rhesus macaque).

Do we have information from users that they would like to find what the different types of orthology are? e.g. 1:1, 1:many, many:many. How is the information going to be used by them?

Wouldn't the binary options e.g. yes, the gene has orthologues versus no, the gene has no ortholog in other vertebrates suffice?

Ensembl provides the orthologues for as many as 146 species (including non-vertebrates). Which ones are we going to pick?

As per paralogues, I'm struggling to see the use case in drug discovery. Is there one? I could think of these scenarios but not sure if they are real scenarios

can my target be not tractable but its paralogue (a gene that arose from duplication in the human lineage) be tractable?
if they have similar function how can my drug modulate one, both or all paralogues potentially out there.

andrewhercules commented 5 years ago

We currently display different types of orthologies based on a call to the Ensembl /homology endpoint and we pass through codes for different species as parameters - see here for a list of species included in the call.

As for paralogues, we also already include that data in the data table view - although the mapping may need to be updated because we only map three homology type values but Ensembl actually has a more extensive list of types that they return in that endpoint:

ortholog_one2one ortholog_one2many ortholog_many2many within_species_paralog other_paralog gene_split between_species_paralog alt_allele homoeolog_one2one homoeolog_one2many homoeolog_many2many

andrewhercules commented 5 years ago

Given the discussions on ticket #538 about the range of data returned by Ensembl's /homology API endpoint , I have updated the design spec and made the following changes:

Updated the text to read Orthologues and paralogues for GENE-SYMBOL that have been identified across a selection of 12 different species
Reduced the options displayed to: 1 to 1 orthologues, 1 to many orthologues, and human paralogues

peatroot commented 5 years ago

@andrewhercules, we could show a boolean table of homology type vs species here. It would be less wordy and we could use the species icons?

Also, is Homology a clearer name than Gene tree?

andrewhercules commented 5 years ago

@peatroot, I had considered using the species icons, but I would like to keep this checkbox boolean pattern consistent with the Chemical Probes and Protein Information summary widgets.

David was the one who recommended the wording to make it clear that it is a selection of species out of the X number of species in Ensembl.

As for the title, I would like to keep it as Gene tree - that is what we currently use in the Platform and I want to make it easy for users to find the data using titles they are already familiar with. And given the conversation in #538, we are not using all of the homology data and so I am hesitant to relabel it Homology, unless we change the underlying data.

peatroot commented 5 years ago

@andrewhercules, I have just had a chat with @d0choa and it sounds like knowing the number of homologues per species (ordered linearly by species similarity to human) is probably more useful than knowing whether there are 1-to-1, 1-to-many orthologues or human paralogues. If we do also show the latter as well, then it'd be more helpful to know the number of homologues for each type (across the 12 species).

I'll try both designs out and we can perhaps discuss at a front-end meeting.

d0choa commented 5 years ago

As @deniseOme pointed out, having information that could be used in preclinical stages is probably one of the main purposes of this widget.

I would anticipate that users might care about (by order of importance):

are there human paralogs of this gene
is the gene conserved in other organisms (and which ones)
in which species is this gene conserved
what type of orthology is it

deniseOme commented 5 years ago

Hi @d0choa @peatroot @andrewhercules. We do have people at GSK using this for safety studies. Whatever we change, it may worth bringing these users on board earlier on (rather than later) so that we continue to address their needs.

TBH, I'd not have this changed at all. Perhaps adding some links out. Nothing else. E.g. linking out to the orthologues or paralogues in Ensembl from the pop up box below:

In the past, we used to provide the ENS gene IDs for each of those orthologues/paralogues. I'd have this back on in place, and hyperlinked so that those users who want to explore more can do so in the original source Ensembl.

Why don't we run some usability or UX design on this? Provide a sheet of exercises to understand what they want (without actually asking them directly what they want).

According to @iandunham, "the use case for paralogues in safety is knowing whether there are potentially other targets that might provide the function of the target you are trying to drug. If there is 1 copy in human but 2 in rat safety testing for instance might be misleading if your drug is specific only for the one copy of the two paralogues." So the 1-to-many relationships (etc) are indeed important.

I'd also link out to the Ensembl gene tree page from my target profile page e.g. IL2RA would be hyperlinked to take me to either http://www.ensembl.org/Homo_sapiens/Gene/Compara_Tree?db=core;g=ENSG00000134460;r=10:6010689-6062370 or http://www.ensembl.org/Multi/GeneTree/Image?gt=ENSGT00390000018872.

d0choa commented 5 years ago

As far as I understand, the discussion is about the information that should be contained in the summary widget. @peatroot correct me if I'm wrong, but the expanded view of the gene tree will be maintained as it is.

andrewhercules commented 5 years ago

I appreciate the points raised by @peatroot and @d0choa and can see the reasons for changing the design of the summary widget to include more data. I know the design spec is not perfect nor 100% ideal. It has shortcomings as it reflects the realities of the data source and our internal project timeline and resource allocations.

That being said, I would strongly recommend that we stick with the design spec that was previously reviewed and agreed with the team for the following reasons:

The summary widgets are to provide a summary of the data and to answer the use case, "Is there X data for my target of interest?". The widget does not need to - and should not - reproduce the underlying data that users can find in the orthology table tab in the detail view (#538). @d0choa, the points you raise are all valid research questions that were explored by my predecessor and can still be answered with the data in the detail view.
The summary widget was deliberately designed to utilise the same boolean design checkbox design pattern used in other widgets. This is to ensure consistency with other widgets, that we reuse code that we have already written to speed up this phase of the project, and that we meet the agreed deadline for completion as per @ElaineMcA's project plan.
While we use icons in the Known Drugs summary widget, the context of their use is that widget is different than how we would use them in this widget. In the Known Drugs summary widget, we use labels and numbers to help users identify the icon and the number of drugs with a given modality. It is a single dimension of the data and it is easy to convey with a coloured icon, label, and number. However, in the design proposed by @peatroot, we are showing multi-dimensional data (species type, species similarity, homology type, counts). We should show the homology type as users in target safety wanted the widget to convey what types of orthologues and paralogues are available. As such, it is more complex data to summarise and it goes beyond providing a quick-read summary based on the key use case for the dashboard identified in point 1 above - in fact, it becomes a widget version of the detail view. In theory, I would be okay with this as we could use this table with icons in the detail view (#538). However, and apologies for sounding like a broken record here, but we are focussing on a "like for like rewrite" as agreed with Ian. We need to minimise the amount of extra work we are putting into this page as rewriting the remainder of the Platform looms on the horizon and there is much more complexity with the other pages.
From my understanding, ordering species by similarity will be different depending on the target and that would result in different ordering and different widget designs. However, as mentioned in the meeting where I shared my research findings, users would like widgets that are consistent across all targets and that are found in a consistent spot on the dashboard. This will enable users to quickly scan the data on the dashboard and identify detail views that they would like to explore.
@deniseOme, in terms of the detail view (#538), we will keep the same features that are currently found in the Platform as we are focussed on a "like for like rewrite".

peatroot commented 5 years ago

Yes, that's right. There's a separate ticket for the detail/expanded view (https://github.com/opentargets/platform/issues/538).

d0choa commented 5 years ago

A few conclusions after the chat we have today @deniseOme, @peatroot, @andrewhercules and me. (@mirandaio might also be interested)

In general terms, the summary widgets go beyond the like-for-like rewrite, as they represent content that was not there in the first place. The amount of information they contain must be succinct and easily interpretable from a user perspective. The widgets shouldn't include too much information, but also they need to be informative, to help to have an overview of the available target information.
In the context of the agreed timeline, there is a hurry to complete widgets at a good pace. However, @peatroot spotted that the information contained in the "gene tree" widget as proposed in this thread might be limited. I agree that the current booleans for the type of homology might not be distinctive of the target. Probably half of the genes would have the same booleans marked.
An alternative version of the widget could have the subset of the most relevant model organisms (including human) and a number representing the number of homologs on each of them. We still don't know if this will work better than the current widget. @peatroot will make a quick implementation to try to resolve this question.
We all agreed we can not do this widget by widget, but we will try to identify if there is any other widget where a minor change would significantly improve the result. We need to get to speed on implementing them, so it would need to be a clear improvement.

peatroot commented 5 years ago

Draft with species icons:

The set of species can be easily reduced, via the API, if it's considered too many.

deniseOme commented 5 years ago

Thanks @peatroot, it looks nice and neat. One question: since we will always be coming from the human gene (our drug target), I'd suspect "human (0) will always be greyed out". If that is true, do we really need to show human?

peatroot commented 5 years ago

I tried a version with human separated as <icon> Human paralogues (<count>), above the subtitle you see currently, which was renamed Orthologues by species, but following discussion with @d0choa, changed to the above, as it is more concise.

peatroot commented 5 years ago

Here's the other version:

d0choa commented 5 years ago

Homologues includes both paralogues and orthologues. For the summary widget it might be enough information. If somebody is interested to know about the type of homology/orthology they can click in the widget.

There might be cases with several hits in human and none in the rest. Several in human and several in others and none in human and some in others (such as the example above). Also the numbers will change from family to family. The image represents a textbook example where there is only 1 copy of the gene in human and only one copy is conserved across a set of organisms. That's not so common for human genes where multiple speciation and duplication events might have happened in the last 1000 million years.

For our users, it will be meaningful because it will contain whether there are other human paralogs (off-target effects) and what model organisms they could potentially use for preclinical studies.

The fixed order based on the species trees should be the next (based on distances to human in "million years ago" from timetree.org):

chimpanzee 6.4 mya
macaque 28.10 mya
mouse 88 mya
rat 90 mya
guinea pig 90 mya
pig 96 mya
dog 96 mya
frog 352 mya
zebrafish 435 mya
fly 797 mya
worm 797 mya

andrewhercules commented 5 years ago

Looks good @peatroot!

@d0choa, are there any other species that should be included in the list?

d0choa commented 5 years ago

I think this is the complete list that you can find inside the widget (12 organisms)

If the question is if we want to expand the list of 12, probably not. It's already a comprehensive list of model organisms. If somebody asks, we could do it, but it's not a priority.

The alternative question is if we want to narrow down the summary widget to only a few organisms (let's say human + 3) and aggregate the rest as "Other". That would depend on how small we want the widget to be. We can probably take this decision once we have the overall view of all widgets.

It looks really nice!

deniseOme commented 5 years ago

Thanks @peatroot for sharing the initial version, which I'd have voted for as it seems concise enough to me and has an important extra piece of information available upfront (and missing in the selected version) i.e. the distinction between orthologues and paralogues in the same box/widget.

I wonder if our users are as aware as we are that our homologues = orthologues + paralogues. I'd have thought that they are not aware of the distinction as they do not tend to be evolutionary biologists or evolutionary geneticists.

We can address this by means of help documentation and monitor if users at workshops or via support complain/or ask what the difference is.

I can foresee people asking why we greyed out the widget in human, for those rare examples where there is one copy only of that gene in the human genome. With the orthologue/paralogue distinction up front, this (possible) question would not be asked. They would no that by being greyed out, there is no paralogue.

p.s. for example I have been asked at a workshop at CRUK Therapeutics in London this year "why do we show (1) always in page like this as the drug table there is always for 1 target only.

Screen Shot 2019-05-09 at 15 45 53

opentargets / issues