Consider DOVEgraph/Phenogrid hybrid

jmcmurry commented 9 years ago

Now that we have loads of species (yay!), it has become difficult to browse less populous associations at a glance in Dovegraph (sad). The top 3 species eclipse virtually all of our other holdings in other species. I really do like the Dove Graphs and a lot of engineering has gone into them, however, while it worked brilliantly with 3 species it seems to work much less well with 12+ species because of there extremely non-normal distribution.

Even when you click on the log scale view (not shown), it is still difficult to see which of the species are actually represented.

I'm wondering whether it makes sense to have a matrix instead of a barchart This is fake data, but something like this...

We could implement faceting on species, but this is an extra step. If the visualization it is meant to lead the user on a path to browse what is available in our collection (my personal preference), it may be worth rethinking the graphs altogether. It is the approximate information richness that is the important thing to convey imho, and this can be done with a color scale. A matrix view also gets us away from having to implement more colors than the human eye can meaningfully distinguish.

Thoughts?

nlwashington commented 9 years ago

i think this is a really interesting possiblity, but we just need to make sure we distinguish it's look&feel to make sure people understand the difference in meaning of the colors (between the phenogrid one-by-many comparison, and this landscape comparison). hover/click functionality on a given intersection point could give some interesting results before drilling down. we should think about how we might select whole rows/columns to look at.

jmcmurry commented 9 years ago

@nlwashington agreed. We could even think about the ternary filled circle approach (no data, some data, lots of data) here so that it is clearly distinct from phenogrid. It depends on how much nuance we want to convey in each 'cell'. The ranking of phenotypes should remain as it is now (with most populous at the top). The species should be ranked left to right if possible.

harryhoch commented 9 years ago

@jmcmurry, we've explored ideas along these lines in the past. Attached are a few mockups that we kicked around a while back....

by organism monarch-front-page-category-coverage-20140429

monarch-front-page-graphs-20140411

monarch-front-page-ortholog-view-201404291506

phenotypes

jmcmurry commented 9 years ago

Thanks for this Harry. It is very helpful background; my feeling is that any histogram-based view, no matter how well designed, will not work to highlight the data we actually have. The scales are just so vastly different, unfortunately. I also prefer to not repeat the species labels if we don't need to. I am intrigued by the ortholog and phenotype data co-existing; perhaps there's a way we could still achieve it in the grid view. Probably can't show both simultaneously as discreet metrics, but there could be a button to toggle the view from just phenotypes associated to the genes of the specific species, to orthologs? Also computing the gene orthologs for each of the categories each of the species might be a bit intense? @nlwashington ?

harryhoch commented 9 years ago

Understood about the limitations. These were design ideas aimed at identifying straightforward ways to show off some data. I agree that the gene by ortholog view is the most intriguing, if we can pull the data together...

kshefchek commented 9 years ago

On paper it's not huge task. Quite a bit of code can be reused from the barchart view. The biggest thing is adding species to the y axis and having things line up, but I imagine we can borrow a bit from the phenogrid here. I think given a week to work on it I could knock it out. But this would only be for basic functionality, the ability to scroll horizontally, add/remove species would take more doing. For showing more complicated data, for example, toggling ortholog counts, we would need to update our solr version to enable pivot table views.

harryhoch commented 9 years ago

two thoughts here:

Should be possible to imagine different approaches to the viz here, but I'd suggest taking a step back to talk about goals, use/cases, etc. I can dredge up some of our old notes on the topic if needed.
If a quick fix is wanted, might be possible to go to a model of % of annotations for each organism in each phenotype group.I.,e, if human are 50% nervous system, 25% etc. cardiovascular, etc., we might have similar distributions in other organisms.

kshefchek commented 8 years ago

I've been toying around with this is in a branch, with the design being something similar to phenogrid and the github contributions grid. There are still some bugs and I haven't coded the transitions between the barchart view and the heatmap/grid view, but I could continue work on it if we think it's worth pursuing. I think we would need some sort of log scale for the colors.

heatmap-1 heatmap-2

kshefchek commented 8 years ago

@jmcmurry is there any interest in moving this forward? The branch is a little stale but I would likely merge it in eventually (without making this view available on the UI) as it contains some other refactors and improvements.

jmcmurry commented 8 years ago

This is great, Kent, I keep meaning to follow up on it. We would need to get the gradation right so that it discriminates the chunks in meaningful ways. Also, since we have at this point more species than we have phenotype categories, not sure which one is best in the rows v columns. It would be great if we could convey the numbers as well as the colors, eg per below. Perhaps on hover? This is old data and a bit scruffy looking, but you get the idea.

cmungall commented 8 years ago

This is v cool

Be good to have this hooked up with faceting, so you could dynamically restrict species list, dive down in ontology hierarchy, etc

jmcmurry commented 8 years ago

In the meantime a quick fix to make dovegraph more intuitive would be to have the facets in the right together with counts. Ultimately we need a solution that doesn't require users to visually distinguish more than 12 colors at a time.

harryhoch commented 8 years ago

+1 .

Julie, really like your overview above. If we have enough species, we might want to consider aggregating/disaggregating by taxonomic groups (genus, family, etc.) to organie the rows better.

As far as the short term solution goes, there are some approaches that might help in distinguishing between bars with a limited number of colors. If colors are used in a consistent order across each phenotype group and a small (1px) gap is placed between each organism's block it might be easier to distinguish between them. A bit of added interaction might also help. We could add a mouse-over event for the facet entries, causing related entries in each of the graph rows to be highlighted.

jmcmurry commented 8 years ago

aggregating/disaggregating by taxonomic groups (genus, family, etc.) to organize the rows better

We could do this, but it would be slightly at odds with a ranked list by data richness. I'm not sure how much there is to be gained, for instance by grouping rats and mice. However, it may be good for the less populous classes such as birds/reptiles? In fact we could do this grouping for just the things in "other"? Just a thought.

I'm with you though that ultimately it would be awesome to have the facets themselves be dynamic and taxonomically grouped; perhaps this is a happy medium alternative to row grouping?

1px gap

In general, this would be a good idea, but many bars are already only 1px wide, making it potentially obliterated by that approach

mouse over event

+1 this is an excellent idea. When hovering over facet, switch from arrow cursor to hand cursor and turn the text orange? That's what Amazon does; works for me.

cmungall commented 8 years ago

We use a taxon closure slim in amigo, see https://github.com/geneontology/amigo/issues/247

harryhoch commented 8 years ago

understood about the aggregation, but it might be useful in some cases.

regarding the 1px gap, understood, but the bars shouldn't probably be that thin in any case. easy to see 1px of white space. harder to distinguish colors.

jmcmurry commented 8 years ago

@mellybelly has been on the hook for a while to suggest here favorite taxonomic groupings. (see related: https://github.com/monarch-initiative/monarch-app/issues/1126).

I'm not sure if the ones you have in geneontology/amigo#247 make as much sense in the Monarch context where we are most heavily biased toward vertebrates:

Mammalia
Vertebrata
Fungi
Metazoa
Viridiplantae
Eukaryota
Archaea
Bacteria

jmcmurry commented 8 years ago

regarding the 1px gap, understood, but the bars shouldn't probably be that thin in any case. easy to see 1px of white space. harder to distinguish colors.

Yes. 100% agreed, hence the proposed matrix view versus the barchart. But in meantime, we do not have control over the degree of scale difference between the species.

monarch-initiative / monarch-legacy

Consider DOVEgraph/Phenogrid hybrid #1056