wilkox / treemapify

🌳 Draw treemaps in ggplot2
http://wilkox.org/treemapify
213 stars 18 forks source link

Multiple Subgroups #20

Closed rlionheart92 closed 6 years ago

rlionheart92 commented 6 years ago

It would be helpful to be able to have more than one subgroup in a similar manner to the treemap::treemap() function (see example), i.e. a subgroup can contain a further subgroup.

I can produce example data if necessary.

rplot

wilkox commented 6 years ago

This is an interesting idea, but I can't see a way to do it without either new aesthetics ad infinitum (subsubgroup, subsubsubgroup...) or introducing a new data structure for hierarchical relationships. treemap::treemap() uses a list of column names which doesn't fit the ggplot2 approach of one column = one aesthetic. I'd be happy to hear any suggestions on how to make this work though.

rlionheart92 commented 6 years ago

I have been considering some approaches to how this feature could be implemented.

I am not sure if either of these are technically possible, but hopefully it might be helpful for fleshing out some ideas of how this could work.

Adding a 'subgroup_level' option to aes()

This option would require the user to define the level/order of their subgroups within a aes() call directly within the geom_treemap_subgroup_ calls.

If we use the Advanced/Developing example as demonstrated for facet_wraps() in the README this might look something like this.

ggplot(G20, aes(area = gdp_mil_usd, fill = hdi, label = country)) +
  geom_treemap() +
  geom_treemap_subgroup_border(aes(subgroup = region, subgroup_level = 1)) +
  geom_treemap_subgroup_border(aes(subgroup = econ_classification, subgroup_level = 2))

The only potential issue with this approach is that you'd have to repeat the aes() calls within the geom_treemap_subgroup_text() call for each additional subgroup.

ggplot(G20, aes(area = gdp_mil_usd, fill = hdi, label = country)) +
  geom_treemap() +
  geom_treemap_subgroup_border(aes(subgroup = region, subgroup_level = 1)) +
  geom_treemap_subgroup_border(aes(subgroup = econ_classification, subgroup_level = 2)) +
  geom_treemap_subgroup_text(aes(subgroup = region, subgroup_level = 1)) +
  geom_treemap_subgroup_text(aes(subgroup = econ_classification, subgroup_level = 2))

Perhaps this could be changed with a subgroup_n in ggplot(aes()), up to a maximum number of 2/3.

ggplot(G20, aes(area = gdp_mil_usd, fill = hdi, label = country,
                   subgroup_1 = region, subgroup_2 = econ_classification)) +
  geom_treemap() +
  geom_treemap_subgroup_border()

Enable the space="free" option for facet_grid()

This approach doesn't actually achieve the same result as I demonstrated in my original comment, however it could achieve something similar by allowing the size of each facet to be variable dependent on the total area of the contents of that facet.

I believe that this does not work at the moment as the treeplots are created using area rather than x and y, would their be someway to work around this?

An example of this from the Cookbook for R looks like this:

ggplot(G20, aes(area = gdp_mil_usd, fill = region, label = country)) +
  geom_treemap() +
  geom_treemap_text(grow = T, reflow = T, colour = "black") +
  facet_grid( ~ econ_classification, space = "free")
wilkox commented 6 years ago

Thanks for your thoughts @rlionheart92. I think adding one or two new aesthetics named something like subgroup2, subgroup3 is the best solution. It's a little ungainly but since this seems to be a genuine use case on balance it's worth supporting.

To make it work I'll also need to add multiple subgroup support to geom_treemap_subgroup_text and geom_treemap_subgroup_border. The treemapify() function will also need some heavy refactoring to juggle the multiple grouping levels. I want to maintain treemapify() as a way of generating data frames of treemap coordinates, as this is useful for people using Shiny e.g. here and here.

On a side note, I've never liked the name 'subgroup'. Unfortunately the more natural 'group' already has a special meaning in ggplot2. Maybe this is an opportunity to introduce a better set of aesthetic names – nest, nest2, nest3 or similar...?

edavidaja commented 6 years ago

Maybe "depth"? Kind of a similar take on what happens in purrr::modify().

wilkox commented 6 years ago

Thanks for your patience while I worked on this feature. I've got support for multiple subgroups working in the 'refactoring' branch, though not fully tested or documented yet. If you want to try it out it can be installed directly from GitHub:

devtools::install_github('wilkox/treemapify', ref = 'refactoring')

The interface for multiple subgrouping is fairly straightforward: you can use subgroup2 and subgroup3 aesthetics in addition to the usual subgroup, and add text and borders with geom_treemap_subgroup2_text, geom_treemap_subgroup3_border etc.

@rlionheart92, would you mind posting your example data for testing?

rlionheart92 commented 6 years ago

@wilkox Thank you for the work that you have done on this feature request. I don't have access to the data shown in my example until next week. So I have quickly written some code that mimics the example that I provided on a smaller scale. Please excuse the messy nature of this code, I wrote it in a hurry.

set.seed(1234)

prov <- c(rep("hospital 1", 60), 
          rep("hospital 2", 30),
          rep("hospital 3", 15))

spec <- as.character(c(
  sort(rep(seq(100, length.out = 20, by = 10),3)),
  sort(rep(seq(100, length.out = 10, by = 10),3)),
  sort(rep(seq(100, length.out = 5, by = 10), 3))
  ))

pod <- rep(c("inpatient_elective", "inpatient_emergency", "outpatient"), 35)

act <- sample(50:500, 105, replace = T)

data <- data.frame(
  prov, spec, pod, act
)

I installed the branch on my local machine, and it does seem to initially work though with some issues. For example the first subgroup layer's text is overlayed by the borders for the second subgroup.

I can help with testing it further if that would be of help?

wilkox commented 6 years ago

@rlionheart92 re. the overlapping of subgroup text and borders, could you post the ggplot2 call that produces the plot? Are there any other issues with the new version or is the overlapping the only problem you're seeing?

BTW, the new version is now on CRAN so you can install it with a normal install.packages("treemapify").