hms-dbmi / UpSetR

An R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al..
https://cran.rstudio.com/web/packages/UpSetR
Other
764 stars 156 forks source link

Use matrix view for axis labels for an existing ggplot2 plot? #102

Open dhimmel opened 6 years ago

dhimmel commented 6 years ago

I'm creating a barplot visualization to show compare three methods of accessing scholarly articles. In this plot, there are three methods: oaDOI, Penn, and Sci-Hub. The plot shows the coverage of the methods independently as well as possible combinations:

ggplot2-coverage

I want to use UpSetR's matrix view for the axis labels. Is there anyway to apply the matrix view to an existing ggplot2 plot axis? I'm interested in the following three things:

  1. Using ggplot2's syntax to define the plot as much as possible (i.e. not having to reimplement themes, ggplot2::geom_col, ggplot2::geom_text, etcetera).
  2. Changing the axis to use a matrix view
  3. Applying this to faceted plots (i.e. with ggplot2::facet_grid, 1 & 2 are bigger priorities).

Is this possible?

Expand to see code and data for example Here's the R code to create the plot above: ```R `%>%` = dplyr::`%>%` plot_df %>% ggplot2::ggplot(ggplot2::aes(x = repos, y = coverage)) + ggplot2::geom_col(fill='pink') + ggplot2::geom_text(ggplot2::aes(label = label, y = 0.02), size=2.5, hjust='inward', color='#000000') + ggplot2::scale_x_discrete(name = NULL, expand = c(0.02, 0)) + ggplot2::scale_y_continuous(name = "Repository's Coverage", labels = scales::percent, expand = c(0, 0), breaks=seq(0, 1, 0.2)) + ggplot2::expand_limits(y = 1) + ggplot2::coord_flip() + ggplot2::theme_bw() + ggplot2::theme( panel.grid.major.y = ggplot2::element_blank(), plot.margin = ggplot2::margin(t = 0, r = 12, b = 0, l = 5, unit='pt')) ``` Here is the contents of `plot_df`: | collection | oadoi_color | venn | repos | n_repos | available | articles | coverage | label | |------------|----------------|------|----------------------|---------|-----------|----------|----------|-----------| | Combined | closed + green | 100 | oaDOI | 1 | 25981 | 208786 | 0.12444 | 26K of 209K articles (12.4%) | | Combined | closed + green | 010 | Penn | 1 | 174375 | 208786 | 0.83519 | 174K of 209K articles (83.5%) | | Combined | closed + green | 001 | Sci-Hub | 1 | 189269 | 208786 | 0.90652 | 189K of 209K articles (90.7%) | | Combined | closed + green | 110 | oaDOI, Penn | 2 | 176915 | 208786 | 0.84735 | 177K of 209K articles (84.7%) | | Combined | closed + green | 101 | oaDOI, Sci-Hub | 2 | 191321 | 208786 | 0.91635 | 191K of 209K articles (91.6%) | | Combined | closed + green | 011 | Penn, Sci-Hub | 2 | 200496 | 208786 | 0.96029 | 200K of 209K articles (96.0%) | | Combined | closed + green | 111 | oaDOI, Penn, Sci-Hub | 3 | 201259 | 208786 | 0.96395 | 201K of 209K articles (96.4%) |
ngehlenborg commented 6 years ago

Thanks @dhimmel! Could you maybe post a rough sketch of what you are trying to do? I am not sure if I am 100% sure that I understand what you are trying to achieve.

dhimmel commented 6 years ago

Hey @ngehlenborg, thanks for the help! The following modification of the above figure should demonstrate what I'm envisioning:

scihub-upsetr

I want to replace the axis tick labels with an UpSetR matrix/dot view. Each bar in the plot represents the coverage of a certain access method or combination thereof.

Currently, I'm using a Venn-like diagram to show this (notebook):

However, these plots confuse people because they're not visualizing set overlap between the three access methods. Instead they show the coverage achieved by combining methods.

Essentially we want to use UpSetR to represent combinations of categories on a plot axis. Ideally, we'd continue using ggplot2 for the rest of the plot specification.