holoviz-topics / examples

Visualization-focused examples of using HoloViz for specific topics
https://examples.holoviz.org
Creative Commons Attribution 4.0 International
80 stars 24 forks source link

Updated versions of t_sne_roots notebooks #337

Open jlstevens opened 7 months ago

jlstevens commented 7 months ago

Updates and supersedes https://github.com/holoviz-topics/examples/pull/286

In this updated version, the default dashboard uses client-side color mixing (i.e. using rasterize). The previous datashaded version is included both as a point of comparison and because it supports dynspread (which ImageStack currently does not).

jlstevens commented 7 months ago

Addressing an earlier question by @jbednar

How many distinct languages are there, and are they sorted by popularity? At a glance it looks like 20 or so main categories, and if so Category20 might give more vibrant colors, or maybe glasbey_category10 if there are 20 main ones but then lots of rare categories. Or Category20 + glasbey_light, in that latter case.

There are 47 languages and they are not sorted. Here are the colormap options I've tried:

cc.glasbey_light (my original choice)

image

cc.b_glasbey_category10 (I found category 10 but not 20 unless you meant cc.b_glasbey_bw_minc_20 which is next)

image

cc.b_glasbey_bw_minc_20

image

cc.b_glasbey_category10 + cc.glasbey_light

image

Can't say I have a strong opinion between these!

github-actions[bot] commented 7 months ago

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/. You can also download an archive of the site from the workflow summary page which comes in handy when your dev site built was overriden by another PR (we have a single dev site!).

jbednar commented 7 months ago

Category20 is from Bokeh: https://docs.bokeh.org/en/latest/docs/reference/palettes.html#d3-palettes

Can you use df.cat.value_counts() to get a list of the categories by popularity, then use Category10 or Category20 for those top 10 or 20 categories, then glasbey_light for the rest?

jlstevens commented 7 months ago

@jbednar Here is category20 followed by glasbey_light when sorted by frequency: image

And here is category10 followed by glasbey_light: image

Both of these need two, hard-to-explain lines of code to compute the correct cmap - we can use either of these but only if we feel they are a significant improvement over what we had before.

jbednar commented 7 months ago

Putting them side by side (glasbey_light, category20, category10) shows the colors do get more vibrant when the more common categories use the brighter category10 colors, and I can see more distinct categories in the figure (looking outside the big are that's orange on the right):

image

If you use glasbey_category10 (already concatenated in colorcet) does it still need both hard-to-explain lines? Putting the most frequent categories with the most intense colors seems like a reasonable thing to record how to do, if it's not too crazy as code.

jlstevens commented 7 months ago

With https://github.com/holoviz/holoviews/pull/6024 dynspread now works with ImageStack:

t_sne_imagestack

However, it is a fair bit slower than the datashaded version:

t_sne_datashaded

jbednar commented 7 months ago

Can you figure out why that would be?

droumis commented 7 months ago

For cmap comparison, here is the image Christopher produced on the far left and then the three most recently being considered in this PR:

image