refactor color schemes - Githubissues

nlwashington commented 9 years ago

having multiple colors, one for each species, is not a tractable solution, and the color in and of itself doesn't really convey any meaning. although initially pretty in the overview, it seems a little gimmicky to me.

maybe we should consider moving to a simple greyscale, or perhaps consider using color to indicate category? (somewhat related to https://github.com/monarch-initiative/monarch-app/issues/449). the difficulty might be when terms fall into multiple categories, but perhaps a split-colored cell can be used for those. (i feel like that's an issue we can take up down the line, maybe the user could eventually be allowed to choose the color for those cells/choose the category with which to group a term.)

nlwashington commented 9 years ago

also, i think we should leave the door open for data overlays, so that (soon) when we want to add the "not" phenotypes onto the grid (as in, if a matching disease scores well, but one of the phenotypes is definitively not shared), it could be indicated by say a bright red color!

harryhoch commented 9 years ago

Agreed. I will try to gather many of these design suggestions into a list of topics to consider for a 2.0 release..

nlwashington commented 9 years ago

Also, I think that the currently configured color ranges are actually giving the user false information. Because the scales have overlapping color ranges, it means that some scores that are identical have different colors, and some scores that have different values have the same color. This is really confusing when viewing all the data together. I think all the colors should be removed and only one scale should be used. Grayscale is sufficient, I think.

harryhoch commented 9 years ago

Ok. are you against multiple disjoint color scales?

On 26 Mar 2015, at 3:19 PM, Nicole Washington notifications@github.com wrote:

Also, I think that the currently configured color ranges are actually giving the user false information. Because the scales have overlapping color ranges, it means that some scores that are identical have different colors, and some scores that have different values have the same color. This is really confusing when viewing all the data together. I think all the colors should be removed and only one scale should be used. Grayscale is sufficient, I think.

— Reply to this email directly or view it on GitHub.

Harry Hochheiser University of Pittsburgh Department of Biomedical Informatics harryh@pitt.edu 412 648 9300

nlwashington commented 9 years ago

i am wishy-washy about disjoint color scales. if you really think disjoint color scales is the way to go, then perhaps only one color per species. but i think i am against it, because color doesn't convey enough extra information, and i think we should reserve it for conveying something else (i have ideas about this). not to mention that once we have > 7 species, it isn't tractable.

harryhoch commented 9 years ago

Ok. let’s talk about color use next week, perhaps..

On 26 Mar 2015, at 4:54 PM, Nicole Washington notifications@github.com wrote:

i am wishy-washy about disjoint color scales. if you really think disjoint color scales is the way to go, then perhaps only one color per species. but i think i am against it, because color doesn't convey enough extra information, and i think we should reserve it for conveying something else (i have ideas about this). not to mention that once we have > 7 species, it isn't tractable.

— Reply to this email directly or view it on GitHub.

Harry Hochheiser University of Pittsburgh Department of Biomedical Informatics harryh@pitt.edu 412 648 9300

nlwashington commented 9 years ago

here is an example of issues with overlapping color scales across taxa giving false and/or confusing information. img_20150409_123003 2

harryhoch commented 9 years ago

Nicole, what else would we use color for if not for similiarity?

nlwashington commented 9 years ago

here, i've mocked up an example so you can play with it in codepen. (not sure how stable it is since i don't have an acct.) here's screenshots: screen shot 2015-04-19 at 2 26 11 pm screen shot 2015-04-19 at 2 27 54 pm

in this example, genotypes/genes/diseases are columns and phenotypes are rows as in our standard phenogrid. rare phenotypes (high IC) that are closely related in each intersection point show up as big dark circles, whereas common phenotypes (low IC) that are not closely related are faint small circles. when you hover over the phenotype (as in the 2nd pic), you see the two numbers that go to populate the color and size (similarity/IC). the mockup has max=10 for each. there is one color scale.

you can play with the original code at this codepen

cmungall commented 9 years ago

Nice! Not sure about showing the two numbers, maybe TMI to take in for an entire row? But overall I like the sizing and coloring

nlwashington commented 9 years ago

yeah, i don't think we should show both numbers to users... that was only here so you guys could see the combinations of values that went to produce the color and size. we can think about what might be the most interesting rollover value to show...it is kindof nice to see numbers of some kind, so long as we can explain them well.

harryhoch commented 9 years ago

This is definitely something that merits some discussion. I like the redundant use of both size and color for the IC/similarity content.

I have two concerns with the sizing.

Scaling factor. If I read your code correctly, it looks like the size of the circle is determined via a linear scale that assigns the score value to the radius. The problem with this approach is that as the resulting areas grow by the square of the radius, the magnitude relationships can be distorted. It's better to make the area a linear function of the mapped values - certainly doable, just a bit of a different take.
Sizes of the cells. We currently have a grid on the order of 30 columns and 25 rows - 750 data points. Each point is currently 10px square. As this doesn't leave a lot of room for fine gradations in area , I fear that the differences might be hard to see. We could go to larger than 10px^2, but this would mean showing less data.

I'll see if I can put something together this week to try the idea. Maybe it will work out...

nlwashington commented 9 years ago

the code doesn't have redundancy... the size and color saturation indicate different things:

color saturation = amount of similarity in the phenotypes between the row/column. for example, if both the query and target are both annotated to the exact term "abnormality of the head", they would have a 100% similar phenotype, whereas if it was small eye (query) vs cloudy eye (target), their term in common might be "abnormal eye", and their similarity might be 70%.

size = IC of the term in common. sometimes the term in common is a very rare node (so it's IC is ~ maxIC), but it could be a very commonly annotated term like "abnormality of the nervous system", which would be closer to minIC.

so, you could have a very commonly annotated term (like abnormality of the nervous system) with a very low IC be annotated exactly between the row/col... thus you would have a large (highly similar) but lightly colored (small IC) circle. on the other hand, you might have a very rare term in common (high IC) and originally annotated terms that are a fair distance apart (low similarity), and thus you would have a small darkly colored circle.

i would agree, though, that for this to work we'd have to increase the size of the cells/grid. i don't think that's such a bad option, particularly if there's some fancier scrolling and/or zooming that's also implemented. i would not be disappointed if we pitched the fixed-size grid.

i would also agree, that we'd have to test out different scaling methods... i only tried linear for this quick & dirty example, but i think we should try out others.

harryhoch commented 9 years ago

@nlwashington, thanks. I didn't read that closely regarding the coding. I think it might be hard for users to understand the similarity/IC distinction, but we can try

understood also about the size of the cells/grid. I think it's always good to start with as much data as possible, but we could consider zoom-in. It's also possible to start with fixed size and to vary as the amount of data gets smaller.

alternative scaling is relatively easy to implement...

yuanzhou commented 9 years ago

Noticed the Github color scheme.

screen shot 2015-07-09 at 11 43 23 pm

nlwashington commented 9 years ago

can we please move to a single color scale?

jmcmurry commented 9 years ago

+1 for a single color. (Eg shades of blue) or for standard mat lab heat map colors, as they're well understood.

harryhoch commented 9 years ago

I still think we need multiple colors for the multi-organism overview.

nlwashington commented 9 years ago

i do not think we need different colors for each species. for the reasons above, i think one color is sufficient. the species are visually separated by the thick line and labeled in the header, so color does not provide additional information. it is pretty, but i don't think the gee-wiz factor overrides accuracy and functionality in the presentation. i am okay with leaving this as a configurable option for any grid installation (as in don't get rid of the generic code), but for the monarch usage of it on our pages, i think it should not be used. i'm sure that @mellybelly and @cmungall also have opinions on this.

harryhoch commented 9 years ago

Other than the overlap, what is the objection to the multiple colors? If we use three disjoint scales, would that address the problem? @yuanzhou, @davism84, can you look into an alternative range for the third scale?

cmungall commented 9 years ago

I'm color illiterate but bearing that in mind:

There are more than 3 species out there; and other ways of grouping result sets (e.g human canonical patients/diseases vs actual patients). We will start adding more soon. I'm not sure what the algorithm is for n species/groupings.

Sometime I get confused as to how similar phenogrid thinks the match is to the query. I think having a consistent color that is the same across all species would help with this.

But +1 to configurability

harryhoch commented 9 years ago

Agreed on configurability, @yuanzhou, and @midavis, please note as a needed configuration.

So, as I understand it, the concern is comparing the degree of similarity across species? That’s a fair concern..I hate to lose the appealing color, but if we are confusing people, that’s not good.

@yuanzhou, @midavis, let’s discuss on Monday.

On Jul 25, 2015, at 12:21 AM, Chris Mungall notifications@github.com<mailto:notifications@github.com> wrote:

I'm color illiterate but bearing that in mind:

There are more than 3 species out there; and other ways of grouping result sets (e.g human canonical patients/diseases vs actual patients). We will start adding more soon. I'm not sure what the algorithm is for n species/groupings.

Sometime I get confused as to how similar phenogrid thinks the match is to the query. I think having a consistent color that is the same across all species would help with this.

But +1 to configurability

— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/phenogrid/issues/23#issuecomment-124795205.

Harry Hochheiser University of Pittsburgh Department of Biomedical Informatics harryh@pitt.edumailto:harryh@pitt.edu 412 648 9300

harryhoch commented 9 years ago

@yuanzhou, can we consider going to one color scheme for all organisms and then using a subtle colored background for the organisms in the multi-organism view?

yuanzhou commented 9 years ago

Updated to one color scale.

capture

Will need to do more research to decide the final number of data classes and the actual color scheme.

Now the phenogrid looks like this:

capture

capture2

yuanzhou commented 9 years ago

Updated to the blue scale based on the feedback from meeting, also got rid of the lightest color to improve readability.

screen shot 2015-07-31 at 12 13 09 am

New screenshots for quick look: screen shot 2015-07-31 at 12 30 26 am

monarch-initiative / phenogrid

refactor color schemes #23