linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Color all cell clusters even when they are > 20 #141

Closed pl-ki closed 6 years ago

JobLeonard commented 6 years ago

The code for this is ready, the only thing holding it back is the colorselection, but I finally figured out a system that is giving me the selection I'm happy with

JobLeonard commented 6 years ago

So this was a very heavy birth, and we're not quite done yet...

First, remember that I'm red-green color-blind. So I'd like to accomodate the color-blind when fixing these categorical colors.

I tried tons of things, see here for an unorganised mess of the scripts used.

The colors coming out of iWantHue were not remotely good enough to me (even with colorblind mode), and the website also crashed all the time when generating huge color ranges. Amit also gave me a set of colors but they weren't quite satisfactory.

So in the end I decided to manually determine the range of colors. After looking for useful color spaces, I ended up with HSLuv, which is a kind of hybrid between HSL and LUV.

Quoting the comment in generate_colors_hsluv.js:

Eyeballing, distinct ranges seem to be:

0-360  hues (that is, full color wheel)
40-100 saturation
20-86  lightness (below and above this, colors fade too much)

However, eyeballing with my own protanomally as a
"worst case baseline", assuming other color-vision
deficiencies are similar (but with other hue ranges),
the following rough rules of thumb seem to hold:

  - changing hues with steps of 60 is a minimum
    at all ranges, so six variations for hue

  - lightness brings relatively high shifts in
    perception for all saturations and hues,
    steps of 6 seem ok, giving twelve options in total:
    [20, 26, 32, 38, 44, 50, 56, 62, 68, 74, 80, 86]

  - color distinction peaks with 62 and 68 lightness,
    so these have a preference when dealing with
    fewer categories

  - for 44, 50 and 56 and 62 lightness, saturation steps
    40, 70 and 100 are available for distinction (3 options)

  - 32 and 74 lightness, saturation should be either
    60 or 100 to avoid overly similar colors (2 options)
  - 20, 26, 80 and 86 lightness should only have 100 saturation

  -  20 and 86 lightness have really low hue distinction, so let's use
     only two far opposing ones.

  - improving distinction between close colors is most
    important. Since hues are cyclic, and we only care
    about the distance between hues for a given L and S,
    we can make a "checkerboard pattern" when varying
    these values by alternating a +30 hue offset.
    That means that identical hues are separated
    by at least two steps of L+S, which hopefully
    improves things a bit.

That leads to a max total of 114 HSLuv colors, many of which
will be very close of course. 

Note that perceived color distance is not weighed equally,
even with the step sizes defined above. For me, the following
holds true when I compare single-step changes in hue, saturation 
or lightness, with the other three staying equal:

  - Lightness has a bigger impact on perceptual changes,
    and does not depend on colorvision
  - Saturation second
  - Hue the least (especially at lower saturation/lightness)

  To generate an order that respects this, I came up with
  Bresenham-inspired accumulators:

  - Take each dimension (hue, saturation lightness), make
    accumulator for each value in each dimension.
  - Determine weights for each value
  - Pick colors one by one. Each step: 
    - add weights to accumulators
    - determine which unpicked color has highest sum of 
      hue + saturation + lightness accumulators
    - pick this color
    - set matching accumulators to zero

Doing this will ensure that values with high weights get picked more often, but not in quick succession. By tweaking both weights and initual accumulator value we can make influence how the selected colors are distributed:

  - Use the more saturated colors before the others
  - Use lightnes ranges 62 and 68 first, since color distinction is highest in these ranges
  - Alternate strongly between light intensities
    when possible

After quite a bit of tweaking with the weights this resulted in a list that was a good first start, but perceptually identical colors still ended up close. Furthermore, ideally we'd put the more visually pleasant colors first, while still making use of different types of contrasts (lightness, saturation and hues). That required manual intervention. So I had to come up with a few functions to make it easy to swap around and inspect colors in an array in the browser console until the results looked ok:

image

To make things worse, I could not do this manual thing alone because I'm red-green colorblind. So you can all thank Irene Izquierdo, my girlfriend, who was willing to spend two afternoons of the holidays helping out. We basically kept swapping colors around until the first twenty looked pleasant to both of us, and all other ones had at least decent contrast with their immediate neighbours.

I think the result is a decent enough compromise for now:

screenshot_20180102_184347

screenshot_20180102_182514

image

It's not perfect, but this is already taking up way more hours than I want to put into them, and I have quite the to-do list.

While we were at it, we also updated the box-colors, because they apparently were a lot more purple than I noticed:

screenshot_20180102_182337

screenshot_20180102_182406

One thing that is left to do is exporting all the metadata (and eventually all rows), because we pre-calculate all the unique values, and up until now we just took the twenty most common ones. The new expansion code increases this to 1000 (with the viewer just cycling through them).

Another thing that is yet-to-be implemented is the scatterplot using different shapes for repeated colors.

pl-ki commented 6 years ago

Fine - I think the main point is that ALL clusters are displayed with some color, even if the first 20 are repeated - the likelyhood that adjescent clusters get very similar color is not that high at < 50 clusters, and we seldom have more (current peak is around 250).

Using different shapes as you mention will probably be beneficial.

Peter Lönnerberg Linnarsson Lab Laboratory of Molecular Neurobiology Dept. of Medical Biochemistry and Biophysics Karolinska Institutet SE-171 77 Stockholm Sweden


Från: Job van der Zwan [notifications@github.com] Skickat: den 2 januari 2018 20:21 Till: linnarsson-lab/loom-viewer Kopia: Peter Lönnerberg; Assign Ämne: Re: [linnarsson-lab/loom-viewer] Color all cell clusters even when they are > 20 (#141)

So this was a very heavy birth, and we're not quite done yet...

First, remember that I'm red-green color-blind. So I'd like to accomodate the color-blind when fixing these categorical colors.

I tried tons of things, see herehttps://github.com/JobLeonard/colorspace for an unorganised mess of the scripts used.

The colors coming out of iWantHuehttp://tools.medialab.sciences-po.fr/iwanthue/ were not remotely good enough to me (even with colorblind mode), and the website also crashed all the time when generating huge color ranges. Amit also gave me a set of colors but they weren't quite satisfactory.

So in the end I decided to manually determine the range of colors. After looking for useful color spaces, I ended up with HSLuvhttp://www.hsluv.org/, which is a kind of hybrid between HSL and LUV.

Quoting the comment in generate_colors_hsluv.jshttps://github.com/JobLeonard/colorspace/blob/master/generate_colors_hsluv.js#L376:

Eyeballing, distinct ranges seem to be:

0-360 hues (that is, full color wheel) 40-100 saturation 20-86 lightness (below and above this, colors fade too much)

However, eyeballing with my own protanomally as a "worst case baseline", assuming other color-vision deficiencies are similar (but with other hue ranges), the following rough rules of thumb seem to hold:

That leads to a max total of 114 HSLuv colors, many of which will be very close of course.

Note that perceived color distance is not weighed equally, even with the step sizes defined above. For me, the following holds true when I compare single-step changes in hue, saturation or lightness, with the other three staying equal:

Doing this will ensure that values with high weights get picked more often, but not in quick succession. By tweaking both weights and initual accumulator value we can make influence how the selected colors are distributed:

After quite a bit of tweaking with the weights this resulted in a list that was a good first start, but perceptually identical colors still ended up close. Furthermore, ideally we'd put the more visually pleasant colors first, while still making use of different types of contrasts (lightness, saturation and hues). That required manual intervention. So I had to come up with a few functionshttps://github.com/JobLeonard/colorspace/blob/master/styledconsole.js to make it easy to swap around and inspect colors in an array in the browser console until the results looked ok:

[image]https://user-images.githubusercontent.com/259840/34494928-0ddae65a-eff3-11e7-8a2b-441ebb3bcde6.png

To make things worse, I could not do this manual thing alone because I'm red-green colorblind. So you can all thank Irene Izquierdo, my girlfriend, who was willing to spend two afternoons of the holidays helping out. We basically kept swapping colors around until the first twenty looked pleasant to both of us, and all other ones had at least decent contrast with their immediate neighbours.

I think the result is a decent enough compromise for now:

[screenshot_20180102_184347]https://user-images.githubusercontent.com/259840/34493619-0776f034-efed-11e7-8326-1a31d5d420ff.png

[screenshot_20180102_182514]https://user-images.githubusercontent.com/259840/34493620-07b50e96-efed-11e7-8779-7c8808e7d31b.png

[image]https://user-images.githubusercontent.com/259840/34496580-5252a280-effa-11e7-9864-08b6f8338ad2.png

It's not perfect, but this is already taking up way more hours than I want to put into them, and I have quite the to-do list.

While we were at it, we also updated the box-colors, because they apparently were a lot more purple than I noticed:

[screenshot_20180102_182337]https://user-images.githubusercontent.com/259840/34493622-0806734e-efed-11e7-90ee-d7c5582d8ca2.png

[screenshot_20180102_182406]https://user-images.githubusercontent.com/259840/34493621-07d981e0-efed-11e7-8734-420126b3bd39.png

One thing that is left to do is exporting all the metadata (and eventually all rows), because we pre-calculate all the unique values, and up until now we just took the twenty most common ones. The new expansion code increases this to 1000 (with the viewer just cycling through them).

Another thing that is yet-to-be implemented is the scatterplot using different shapes for repeated colors.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/loom-viewer/issues/141#issuecomment-354852552, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AInXbyK9SVUbztcte8J8hnNtsJu5X8m1ks5tGoGogaJpZM4RKBEB.

JobLeonard commented 6 years ago

Implemented, except for the shapes, which will come later