mckinsey / vizro

Vizro is a toolkit for creating modular data visualization applications.
https://vizro.readthedocs.io/en/stable/
Apache License 2.0
2.63k stars 118 forks source link

[Bug] Change default continuous color scale to `SEQUENTIAL_CYAN` #407

Closed huong-li-nguyen closed 5 months ago

huong-li-nguyen commented 6 months ago

Description

Screenshot 2024-04-05 at 13 27 47

Screenshot

Run dev example:

Screenshot 2024-04-08 at 15 02 58

Notice

petar-qb commented 6 months ago

To be honest, I don't fully understand why we need this change πŸ˜„. Below is a screenshot for the same application running from the main branch and it somehow seems more natural to me to understand because the same colour palette is always used and users don't have to look carefully at the colorbar to understand which colour means higher and which colour means lower values.

This means that if we add a constant number (eg +15) to each cell element, the same chart colours will be displayed per country on the map (and only the colorbar mark label values will change). However, that's just my opinion and I don't know if there's any UX rule that says otherwise. πŸ˜„

image

Also, isn't that red colours should be used for negative and blue for positive values in your "Mixed Life Expectancy" example?

huong-li-nguyen commented 6 months ago

Also, isn't that red colours should be used for negative and blue for positive values in your "Mixed Life Expectancy" example?

Ah yes, that needs to be reversed!

It came out of a discussion I've had with @antonymilne where he rightfully pointed out that in the above example it shouldn't be a diverging color sequence but a sequential one. These two color sequences have different meanings in color theory. I can recommend reading this article:

A sequential color sequence indicates to the user that the color category goes from low to high with no specific meaning to any of the mid-values (which is the case for life expectancy). For example, if you look at the screenshot you've sent - a neutral white color is assigned to the life expectancy in the range of 50-60 years, but there is no meaning behind this.

A diverging color sequence should usually be applied if you e.g. have positive and negative values. The neutral mid color will then highlight the 0 values normally, so here it has a meaning.

So choosing a sequential or diverging color sequence does have an impact on how people could interpret the colors. Is that clearer now?

petar-qb commented 6 months ago

@huong-li-nguyen , thanks a lot for the explanation and the link (the article is excellent!). All these changes make sense to me now πŸ˜„.

P.S. I've updated the screenshot from the PR description.

maxschulz-COL commented 5 months ago

Just one thing though: do we definitely want the colors this way round? So that blue = high and red = low. To me this feels backwards since it's the opposite of a temperature scale that goes from blue = cold to red = hot so it would feel more natural to me to have blue = low and red = high.

Definitely I agree that we should be consistent between sequential and diverging palettes so that the same color always means the same thing, but to me these should be completely reversed everywhere.

Tricky - I feel like that since we have this automation specifically for negativ to positive numbers, I am more comfortable with red = negativ, blue = positive (green would probably make this even more clear...)

huong-li-nguyen commented 5 months ago

So when using diverging color scales, take an extra minute to think about your color choices. Which ones would be the most intuitive colors for your readers (e.g. party colors, or red = bad)? And how can you make clear which color means which extreme? Label your color legend well and/or turn your title into a color key, and use annotations in the chart to remind readers what they see.

@antonymilne - the answer is it depends πŸ˜„ I did ask Joe and Dan about the current choice and they both agreed that blue represents positive values, and red represents negative values and intuitively I also thought so. However, I definitely understand your confusion and I don't think this color sequence is crystal clear. But the direction of color sequence not only depends on the +/n sign but actually the context of your data.

In the example below, the negative values are represented by green, and the positive values by red. Now you might think: Shouldn't it be vice versa? In this case, it depends. If the desired parameter/outcome is actually a negative value (let's say -40% is our target), then green color is correctly assigned to the negative values, because green is usually thought of as the positive outcome color. Positive in the sense of desired result.

Screenshot 2024-04-08 at 12 28 02

Regarding @maxschulz-COL comment on green-red, we always try to avoid that due to accessibility concerns (red-green color blindness which is actually quite common), so that's we use blue-red.

I don't think the diverging red-cyan will always represent the color sequence correctly because it depends on the data context and the example above also shows that negative values do not always represent negative outcomes. So I don't think we can get it right for every use case. I would therefore stay with the current choice as it's Dan's preference.

huong-li-nguyen commented 5 months ago

Have you checked if this affects any custom choices? I would assume color_continuous_scale and color_continuous_midpoint still work (e.g. when I would like the middle of my diverging scale to be a certain value)

Very good point @maxschulz-COL !! I just tried that out and it is not compatible 😭 So maybe that was the reason we didn't turn it on in the past πŸ€” Here for context: https://stackoverflow.com/questions/61083382/plotly-make-subplots-doesnt-keep-selected-colors

This means however, we have a choice to make:

  1. We have autocolorscale=True and automatically provide a sequential and diverging color sequence based on the data, but if a user wants to provide their own color sequence via color_continuous_scale or other params, they explicitly have to turn off autocolorscale=False in their update layout calls.

  2. We have autocolorscale=False, which enables the user to use color_continuous_scale in their charts without having to explicitly do a post update call, but this means we can only provide one default continuous color scale which would be SEQUENTIAL_CYAN.

  3. We have some custom logic that turns autocolorscale=False if it detects color_continuous_scale or any other param in the custom chart. Or the other way around, we provide our theme defaults directly to the argument color_continuous_scale. However, this adds some custom logic to our source code, not sure if we want that.

Both approaches have pros and cons..any preferences? @antonymilne @maxschulz-COL @petar-qb @Joseph-Perkins

huong-li-nguyen commented 5 months ago

Both approaches have pros and cons..any preferences? @antonymilne @maxschulz-COL @petar-qb @Joseph-Perkins

We decided to go for option 2) for easy use for users to still customise their color sequences.