mtennekes / treemap

R package for treemap visualisation
64 stars 24 forks source link

colors in the aggregate #18

Closed timelyportfolio closed 9 years ago

timelyportfolio commented 9 years ago

I'm sure this is my own ignorance, but it appears that colors in the aggregate work differently than I would expect. For instance, if we use only do one level using GNI2010.

library(treemap)
library(dplyr)

treemap(GNI2010, index=c("continent"), vSize="GNI", vColor="GNI", type="value") %>%
  { .$tm } %>%
  select( continent, vColor, color )

gives me

      continent  vColor   color
1        Africa  106410 #EFF8AA
2          Asia  285410 #CEEA84
3        Europe 1056360 #0F8445
4 North America  240850 #D9EF8B
5       Oceania   80770 #F3FAAF
6 South America   71410 #F3FAAF

and

image

while setting index = c("continent","iso3") gets me different colors.

treemap(GNI2010, index=c("continent","iso3"), vSize="GNI", vColor="GNI", type="value") %>%
  { .$tm } %>%
  filter( is.na(iso3) ) %>%
  select( continent, vColor, color )

gives me

      continent  vColor   color
1        Africa  106410 #006837
2          Asia  285410 #006837
3        Europe 1056360 #006837
4 North America  240850 #006837
5       Oceania   80770 #0C7F43
6 South America   71410 #1A9850

and a ugly plot to show the colors

image

Naively, I would expect the colors assigned to the aggreage continent to be the same. This is important when trying to match the colors assigned by treemap in #17.

timelyportfolio commented 9 years ago

Perhaps, the answer lies in the type. Maybe I should expect the colors to match at the aggregate level only when the vColor is normalized with a type="dens" or type="comp" or if I manually set a range based on the aggregate totals.

mtennekes commented 9 years ago

What happens, is that the range of values of the lowest nodes, i.e. the rectangles that will be coloured, is assigned to the color palette. So for the index=c("continent") treemap the lowest nodes are the continentes with aggregated values approximately from 0 to 1200000. For the index=c("continent","iso3") treemap, the lowest nodes are the countries with values only up to 90000. Hence the darkest color is assigned to 90000 in this case (instead of 1200000). The colors of the aggregated values of the continents are 'truncated' to dark green. (Note that the South America bubble has a ligher colour, since its aggregated value, 71410, is included in the range 0-90000). This behaviour also occurs for other treemap types.

The savest way is indeed to use the range:

treemap(GNI2010, index=c("continent","iso3"), vSize="GNI", vColor="GNI", type="value",
range=c(0,1200000)) %>%
    { .$tm } %>%
    filter( is.na(iso3) ) %>%
    select( continent, vColor, color )

      continent  vColor   color
1        Africa  106410 #EFF8AA
2          Asia  285410 #CEEA84
3        Europe 1056360 #0F8445
4 North America  240850 #D9EF8B
5       Oceania   80770 #F3FAAF
6 South America   71410 #F3FAAF
timelyportfolio commented 9 years ago

Ok, thanks for clearing this up. It appears sum is the only aggregate function available if I read these lines correctly. What happens with things like averages that shouldn't sum? Is there any way to hack/override the aggregation?

All this is important to me as I try to make d3treeR. I understand that treemap was not developed with this use case in mind. However, since only two levels show at a time in the experimental iteration of d3treeR aggregate colors will likely need more than just sum. I could easily write a function to calculate the different aggregates on the return value from treemap, but then I don't have all the information I would need to apply the original color scale. I'll hack away a little more to explore other options. Maybe I should just wrap treemap with d3tree, and then I would have all the original arguments with which I could possibly make it work.

mtennekes commented 9 years ago

Exactly. Originally, I only had two treemap types, comp and dens, of which I thought they were sufficient from a statistical point of view. However, for optimal functionality, the value type was born, which turned out to be, I think, the most used one.

It's not hard to generalize the aggregation function (with sum as default). However, for aggregation of averages or ratios, we probably need a weighted average, since we cannot simply average, say, the percentages of smokers per country to continents. For this, we need population numbers per country as weights.

Is this reasoning what you had in mind, or do you have other kind of aggregates in mind? For the GNI2010 example, I think summing is probably the correct aggregation function. I also used a fixed range when zooming in and out in my (very primitive) interactive shiny tool itreemap.

timelyportfolio commented 9 years ago

You understand correctly and yes GNI2010 does not really apply here but it is the first example in treemap :) Averages are the most likely use case. I tried to make a count column = 1 and then use type='dens' but that is the reverse average if I understand correctly and also does not make sense for the leaf level.

mtennekes commented 9 years ago

Now there is an argument fun.aggregate! If weighted.mean is used, the weights, argument w, are by default the vSize variable. See https://github.com/mtennekes/treemap/blob/master/test/test_aggregation_functions.R

If you have a useful typical dataset that contains averages, we could include it in the package.

timelyportfolio commented 9 years ago

beautiful, thanks so much for the very quick response! I'lll play with it throughout the day and report back. So far it looks great.

timelyportfolio commented 9 years ago

@ignacio82 says this solved his problem. I played with it more today with max, min, median, and they all worked great. I don't have a dataset, but I'll put some thought into examples with one of the built-in base datasets and demonstrate with d3treeR. It would probably also play nicely with old-fashioned tables and xtabs.

Thanks again!

timelyportfolio commented 9 years ago

Happy to close this. Thanks again for such a quick response!