tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.46k stars 2.02k forks source link

Combining multiple panels with or without panel size adjustment #326

Closed kohske closed 12 years ago

kohske commented 12 years ago

Here is an initial implementation. https://github.com/kohske/ggplot2/tree/feature/plot-layout

Two way of combining:

  1. Size-sensitive combining: arrange plots with panel size adjustment. Currently, using S3 of cbind and rbind.
  2. Simple layout -- like grid.arrange in gridExtra. Not implemented yet.

Any idea and suggestions are welcome.

baptiste commented 12 years ago

latticeExtra defines a c() method for trellis objects; from what I understand however it would not be very suitable for ggplot2. Am I correct in thinking that any such combination of plots should only alter (at most) the plot dimensions, but not deal with removing axes, combining legends, axis labels, titles, etc.?

One feature that's been requested quite often is a multipage option. I don't know how hard that would be to implement for facetting in general. The grid.arrange() way is quite trivial and will be present in the next version of gridExtra (the basic idea is currently on SO http://stackoverflow.com/a/6687147/471093)

wch commented 12 years ago

Here are some features that I think would be useful.

AAB
AAC
DDD
AAAB

It would be nice to be able to specify the sizes as integers, like (3,1), as well as nonintegers, like (1,.33).

hadley commented 12 years ago

I like this idea - lets put it in 0.9.1 (when hopefully gtable will be its own package)

Hadley

On Sun, Dec 25, 2011 at 7:26 PM, kohske takahashi reply@reply.github.com wrote:

Here is an initial implementation. https://github.com/kohske/ggplot2/tree/feature/plot-layout

Two way of combining:

  1. Size-sensitive combining: arrange plots with panel size adjustment. Currently, using S3 of cbind and rbind.
  2. Simple layout -- like grid.arrange in gridExtra. Not implemented yet.

Any idea and suggestions are welcome.


Reply to this email directly or view it on GitHub: https://github.com/hadley/ggplot2/issues/326

Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/

kohske commented 12 years ago

@baptiste @wch @hadley

Thanks.

I want to make sure what we need to consider. I here call the size sensitive arrange as table (new feature), and the simple layout layout (inherits the notion of grid.arrange).

The layout is not difficult except for the interface to specify the layout. Actually, layout is no more than a kind of syntax suger for grid.layout and layout.pos.col/row in grid, so users can write these function by themselves.

Please find the visual tests in: https://github.com/kohske/ggplot2/blob/7bf05ba7c25d58d584430e4c7b2f62c6e889f48b/inst/tests/visual-plot-layout.r and docs in: https://github.com/kohske/ggplot2/commit/7bf05ba7c25d58d584430e4c7b2f62c6e889f48b#L3R66

layout never modify the plots. It only arrange the plots on the layout. Also, it supports nested-cell-like layout, as shown in @wch's comment.

User can specify dimension (i.e., nrow, ncol, or both). https://github.com/kohske/ggplot2/blob/7bf05ba7c25d58d584430e4c7b2f62c6e889f48b/inst/tests/visual-plot-layout.r#L9

Or, user can specify the layout itself. https://github.com/kohske/ggplot2/blob/7bf05ba7c25d58d584430e4c7b2f62c6e889f48b/inst/tests/visual-plot-layout.r#L24

In addition to layout dimension, users can specify widths/heights of each row/column. https://github.com/kohske/ggplot2/blob/7bf05ba7c25d58d584430e4c7b2f62c6e889f48b/inst/tests/visual-plot-layout.r#L32

I wonder which is the best interface to set layout. Here is the current implementation:

layout.ggplot(p[[1]], p[[2]], p[[3]], p[[4]], p[[5]], 
  layout = list(row = list(1, 1, 1, 2, 2), col = list(1, 2, 3, 1, 2)))

This is a wrapper for layout.pos.col in grid. But obviously this is ugly interface, i think.

Do you have any good idea? And is there any other features that should be implemented?

Anyway, the importance of layout would be less than that of table.

I will discuss about table later. Should I move to ggplot-dev list?

kohske commented 12 years ago

table is more complicated and need to consider bunch of things carefully.

First, interface to specify the table layout. This is same issue with layout, so it would be better to make the interface consistent between table and layout

In addition to the interface, there are two big issue:

1) guides (legends), title, axis, etc. as @baptiste suggested.

In table, the dimension (nrow/ncol) of each col/row must be consistent. If the plots are simple such as qplot(1:3, 1:3), it is easy to combine.

But sometimes the nrow/ncol are inconsistent. For example, if plot A has left-side legend, and plot B has no legend, the ncol of A is larger than that of B. So A and B cannot be rbinded (cbind is OK). Same with title and axis. So, what do you think is the best way to handle them.

a) drop guides (legends) and keep other materials. As guides can be extracted as a single object, users can combine plots and then put the guides on some place. Other materials were kept as is. But a blank cell are added by align-to-larger-plot rule. I like this way.

b) Induce an error, and enforce users to appropriate dimension. I also like this way as an initial implementation.

or any other possibility?

2) facet facet makes table much much more complicated. Maybe we do not need to support the size-sensitive combining facetted plots. So at least, initial implementation (0.9.1) will not support facet, simply induce an error.

wch commented 12 years ago

Here's an idea for how to specify the arrangement for table and layout, in a single unified way. I don't know what the "normal" way is to do this is -- this is just what comes to mind.

Suppose you could give it a matrix, like this:

layout <- matrix(c(1,1,3, 1,1,3, 2,2,2), ncol=3, byrow=TRUE)
#      [,1] [,2] [,3]
# [1,]    1    1    3
# [2,]    1    1    3
# [3,]    2    2    2

layout.ggplot(p[[1]], p[[2]], p[[3]], layout=layout) 

This means: use a 3x3 grid, put the first object p[[1]] in the upper-left 2x2, put the second object p[[2]] in the bottom 1x3, and put the third object p[[3]]in the upper-right 2x1. You would have to check that the numbers are all in rectangles.

And to specify the relative size of rows and columns, you could add row and column names:

layout <- matrix(c(1,2, 3,NA), ncol=2, byrow=TRUE, dimnames=list(c(4,1), c(4,1)))
#   4  1
# 4 1  2
# 1 3 NA

layout.ggplot(p[[1]], p[[2]], p[[3]], layout=layout) 

The NA just means that that cell is empty. The row and column names determine the relative size of the cells. So the upper-left cell is 80% of the width and 80% of the height; the upper-right cell is 20% width, 80% height; and so on. Hopefully you could also use non-integer values.

This method is also a little syntactically awkward (because using matrix() is a little awkward), but there is an advantage: it allows the user to have a nice visual representation of what exactly they are creating.

baptiste commented 12 years ago

another general question is whether one wants to draw directly on the page, or return a grob as an alternative. It could be nice if layout played well with other graphical objects (e.g. lattice). Certainly compatibility with ggsave is something that would be desirable for the ggplot2 user, and I think multipage as well.

grid.arrange supports most of these features, only multi-cell spanning isn't implemented as I couldn't find an elegant way of defining the layout.

kohske commented 12 years ago

@wch

Thanks. Actually that's the way of layout in base graphics. see ?layout

Perhaps,

layout <- matrix(c(1,2, 3,NA), ncol=2, byrow=TRUE, dimnames=list(c(4,1), c(4,1)))

should be

layout <- layout(c(1,2, 3,NA), ncol=2, byrow=TRUE, widths = c(1,4), heights = c(4,1))

The advantage is of course visual representation and also that some users are familier with this system.

kohske commented 12 years ago

I pushed a testbed for combining multiple panel. I also put visual tests, so any feedback is appreciate.

@baptiste ggarrange now does not support grob-extraction, but will support in future. Also, it may support multi-page options.

kohske commented 12 years ago

@baptiste I implemented grob-extraction here: https://github.com/kohske/ggplot2/tree/feature/plot-layout Is it possible to combine with gridExtra? Could you please run the example codes?

wch commented 12 years ago

@kohske I just took a quick look at the code. There is a r directory and a R directory, which doesn't seem right.

kohske commented 12 years ago

@baptiste oops, fixed. thanks.

kohske commented 12 years ago

@wch Sorry, I had a mistake about your name. Thanks anyway. I'm happy if you will review the branch.

baptiste commented 12 years ago

@kohske I run the examples; it's great. One thought, have you considered something like a global title, on top of the page (and perhaps one for each side as well, just in case)?

ggarrange, from what I can see, could easily work with arbitrary grobs, in which case it would have reproduced the full functionality of grid.arrange (minus the latest addition of multiple pages and ggsave support). That's why I'm not sure what you mean by "combine with gridExtra", can you clarify please?

baptiste commented 12 years ago

I have a question, also: as those functions (presumably) become more and more self-contained and independent (moving towards gtable), perhaps it could be good to systemize the naming convention (single "g" for "grid"? or double "gg" for "grammar of graphics"). Grid/gridExtra have already defined grid.layout and grid.arrange (I'd be willing to remove the latter when ready), leaving the possibility:

garrange, glayout, whilst ggtable, however, would remain with ggplot2 (but the name is probably confusing, with the gtable package).

baptiste commented 12 years ago

oh, and another side-remark: I noticed you have an example with inset plot; annotation_custom is probably more flexible for this particular purpose [https://github.com/baptiste/ggplot2/blob/customgrob/R/annotation-custom.r#L29]

wch commented 12 years ago

@kohske I tried it out, and I like it! It works very well. A couple of comments:

I like the idea of having insets but the interface seems somewhat awkward. Am I right that it requires using a character matrix like this? It seems kind of inelegant, not that I have a better idea.

> lay <- gglayout(row = list(1, 2, 2, 3, 3:4, 4), col = list(1:3, 1:2, 3, 1, 2:3, 3))
> lay
     [,1] [,2] [,3]  
[1,] " 1" " 1" " 1"  
[2,] " 2" " 2" " 3"  
[3,] " 4" " 5" " 5"  
[4,] ""   " 5" " 5 6"

 Widths:   0.3333333 0.3333333 0.3333333 
 Heights:  0.25 0.25 0.25 0.25 
 Respect:  FALSE 

If I wanted to inset a graph and have it set away from the edge, it seems I would have to do something like this:

p <- lapply(1:2, function(i) ggplot(mtcars, aes(factor(cyl))) + geom_bar(fill = rainbow(10)[i]) + opts(title = paste(i)))

lay <- gglayout( matrix(
   c("1", "1", "1",
     "1", "1 2", "1",
     "1", "1", "1"), 3, byrow=TRUE, ),
   widths = c(3, 2, .5), heights = c(3, 2, .5))

ggarrange(plots = p[1:2], layout = lay)

This seems kind of complicated to me (but again I don't have a better idea right now).

Finally, I would suggest having some code to detect when the numbers aren't laid out in rectangles. In the example below, I think it should give a warning/error, but right now it doesn't:

lay <- gglayout( matrix(
   c(1, 2,
     2, 1), 2, byrow=TRUE, ))

ggarrange(plots = p[1:2], layout = lay)
kohske commented 12 years ago

@baptiste @wch thanks you for the comments. Here is point-to-point comments.


ggarrange, from what I can see, could easily work with arbitrary grobs, in which case it would have reproduced the full functionality of grid.arrange (minus the latest addition of multiple pages and ggsave support). That's why I'm not sure what you mean by "combine with gridExtra", can you clarify please?

ggarrange returns gtable object, but grid.arrange could not work with gtable. But work after gtable_gTree. So it will be easy to fix.

gg <- ggarrange(qplot(1:3, 1:3), qplot(1:3))
grid.arrange(gg, xyplot(1:10~1:10)) # does not work
grid.arrange(gtable_gTree(gg), xyplot(1:10~1:10)) # does work

One thought, have you considered something like a global title, on top of the page (and perhaps one for each side as well, just in case)?

ggarrange and ggtable will have main argument.


As for the naming, g means grid, gg means a part of ggplot2. So, g* is available outside ggplot2 while gg* only works with ggplot2.

I named ggtable, ggarrange in correspondence to ggsave. Please let me know if do you have better naming.


I noticed you have an example with inset plot; annotation_custom is probably more flexible for this particular purpose [https://github.com/baptiste/ggplot2/blob/customgrob/R/annotation-custom.r#L29]

Yes, the inset plot is unplanned feature. After implementation, I found that the inset plot is possible. So probably not well designed.


Am I right that it requires using a character matrix like this?

No. At the moment, gglayout has two way + alpha for setting the layout.

1) layout matrix, which is compatible with layout in graphics package:

m <- matrix(
  c(1, 1, 1,
    2, 2, 3,
    4, 5, 5,
    6, 5, 5), 4, byrow = T)
lay <- gglayout(m)

2) the list of row/col spans, which is compatible with grid.layout and layout.pos.row in grid package:

lay <- gglayout(row = list(1, 2, 2, 3, 3:4, 4), col = list(1:3, 1:2, 3, 1, 2:3, 1))

3) Plus alpha is a kind of automatic generation of layout matrix by nrow, ncol, dim etc.

The inset plot is possible only by 2) the list of row/col span.


I would suggest having some code to detect when the numbers aren't laid out in rectangles. In the example below, I think it should give a warning/error

Agreed. I'm looking for the good algorithms for detecting such incorrect layout.


Now I'm using ggarrange and ggtable in my daily work, and found it is so useful. Further commets are welcome.

Thanks!!

baptiste commented 12 years ago

On 9 January 2012 15:20, kohske takahashi reply@reply.github.com wrote:

@baptiste @wch thanks you for the comments. Here is point-to-point comments.


ggarrange, from what I can see, could easily work with arbitrary grobs, in which case it would have reproduced the full functionality of grid.arrange (minus the latest addition of multiple pages and ggsave support). That's why I'm not sure what you mean by "combine with gridExtra", can you clarify please?

ggarrange returns gtable object, but grid.arrange could not work with gtable. But work after gtable_gTree. So it will be easy to fix.

gg <- ggarrange(qplot(1:3, 1:3), qplot(1:3))
grid.arrange(gg, xyplot(1:10~1:10)) # does not work
grid.arrange(gtable_gTree(gg), xyplot(1:10~1:10)) # does work

 One thought, have you considered something like a global title, on top of the page (and perhaps one for each side as well, just in case)?

ggarrange and ggtable will have main argument.


As for the naming, g means grid, gg means a part of ggplot2. So, g* is available outside ggplot2 while gg* only works with ggplot2.

I named ggtable, ggarrange in correspondence to ggsave. Please let me know if do you have better naming.

I guess my question is whether ggarrange could also work with other grobs, or should it remain specific to ggplot? I'd be in favor of making it more general, if only to avoid having two very similar functions in different packages. You could name it arrangeGrob and I'd remove its cousin from gridExtra.

Making the current arrangeGrob compatible with ggarrange does not help if one wants to include, say, a tableGrob next to a ggplot.

Cheers,

b.


I noticed you have an example with inset plot; annotation_custom is probably more flexible for this particular purpose [https://github.com/baptiste/ggplot2/blob/customgrob/R/annotation-custom.r#L29]

Yes, the inset plot is unplanned feature. After implementation, I found that the inset plot is possible. So probably not well designed.


Am I right that it requires using a character matrix like this?

No. At the moment, gglayout has two way + alpha for setting the layout.

1) layout matrix, which is compatible with layout in graphics package:

m <- matrix(
 c(1, 1, 1,
   2, 2, 3,
   4, 5, 5,
   6, 5, 5), 4, byrow = T)
lay <- gglayout(m)

2) the list of row/col spans, which is compatible with grid.layout and layout.pos.row in grid package:

lay <- gglayout(row = list(1, 2, 2, 3, 3:4, 4), col = list(1:3, 1:2, 3, 1, 2:3, 1))

3) Plus alpha is a kind of automatic generation of layout matrix by nrow, ncol, dim etc.

The inset plot is possible only by 2) the list of row/col span.


I would suggest having some code to detect when the numbers aren't laid out in rectangles. In the example below, I think it should give a warning/error

Agreed. I'm looking for the good algorithms for detecting such incorrect layout.


Now I'm using ggarrange and ggtable in my daily work, and found it is so useful. Further commets are welcome.

Thanks!!


Reply to this email directly or view it on GitHub: https://github.com/hadley/ggplot2/issues/326#issuecomment-3406541

wch commented 12 years ago

@kohske

I would suggest having some code to detect when the numbers aren't laid out in rectangles. In the example below, I think it should give a warning/error

Agreed. I'm looking for the good algorithms for detecting such incorrect layout.

I think this should do the trick. It could probably be cleaned up a little, though.


# Returns TRUE if all the x's in mat are arranged in a rectangle,
# FALSE otherwise
is_rect <- function(x, mat, ...) {

  numcol <- ncol(mat)
  numrow <- nrow(mat)

  # The column and row numbers for each location in the matrix
  colnums <- matrix( rep(1:numcol, numrow), ncol = numcol, byrow=TRUE)
  rownums <- matrix( rep(1:numrow, numcol), nrow = numrow, byrow=FALSE)

  # Find matrix locations that contain x
  xloc <- mat == x

  # Find min and max value of cols and rows with x.
  # This specifies a rectangle.
  rowmin <- min(rownums[xloc])
  rowmax <- max(rownums[xloc])
  colmin <- min(colnums[xloc])
  colmax <- max(colnums[xloc])

  # All values of xloc inside the rectangle should be TRUE.
  # All values of xloc outside the rectangle should be FALSE.
  # Check this by inverting the rectangle and checking that all == FALSE
  xloc[rowmin:rowmax, colmin:colmax] <- !xloc[rowmin:rowmax, colmin:colmax]

  if (any(xloc)) return(FALSE)
  else           return(TRUE)

}

# Returns TRUE if all values in mat are in rectangles, FALSE otherwise
is_all_rects <- function(mat) {

  # Get all the unique values in mat
  nums <- unique(as.vector(mat))

  # (Not totally sure this uses vapply correctly...)
  goodrects <- vapply(nums, is_rect, TRUE, mat)

  if (all(goodrects)) {
    return(TRUE)
  } else {
    stop(paste("These numbers are not in rectangles: ",
           paste(nums[!goodrects], collapse=", ")))
    return(FALSE)
  }  

}

These function names are probably not the best for the ggplot2 namespace. Also, it won't work for all character matrices -- it could fail for cases where there's weird stuff like " 5 6".

Some test cases:

mat <- matrix(c(
  1, 1, 2,
  1, 1, 2,
  3, 3, 2
  ), 3, byrow=TRUE)
is_all_rects(mat)
# [1] TRUE

mat2 <- matrix(c(
  1, 1, 2,
  1, 1, 2,
  3, 3, 1
  ), 3, byrow=TRUE)
is_all_rects(mat2)
# Error in is_all_rects(mat2) : These numbers are not in rectangles:  1

mat3 <- matrix(c(
  1, 1, 2,
  1, 1, 2,
  1, 3, 3
  ), 3, byrow=TRUE)
is_all_rects(mat3)
# Error in is_all_rects(mat3) : These numbers are not in rectangles:  1

mat4 <- matrix(c(
  1, 1, 2,
  1, 1, 3,
  3, 1, 3
  ), 3, byrow=TRUE)
is_all_rects(mat4)
# Error in is_all_rects(mat4) : These numbers are not in rectangles:  1, 3
kohske commented 12 years ago

@baptiste

I guess my question is whether ggarrange could also work with other grobs, or should it remain specific to ggplot? I'd be in favor of making it more general, if only to avoid having two very similar functions in different packages. You could name it arrangeGrob and I'd remove its cousin from gridExtra.

I'm not sure how general ggarrange should be. Probably including some grobs is useful like this:

ggarrange(
  qplot(iris[,1], iris[,2]), 
  tableGrob(iris[1:6, 1:2], gp=gpar(fontsize=8)))

and output here: https://skitch.com/e-kohske/gadjt/2012-01-09-13.50.38

But in my view ggarrange will not take care of other grid-base graphics such as lattice. So I think arrangeGrob in gridExtra will be still useful as the integrator of grid-based graphics.

How do you see it, @hadley?

kohske commented 12 years ago

@wch Thanks. I was looking for any general algorithms (i.e., by some basic matrix operations) but could not find. I slightly modified your code and will merge it.

is_rect <- function(i, mat) {
    m0 <- array(FALSE, dim(mat))
    is <- which(mat == i, arr.ind=T)
    m0[seq(min(is[, 1]), max(is[, 1])), seq(min(is[, 2]), max(is[, 2]))] <- TRUE
    is <- which(m0)
    all(mat[is] == i) && all(mat[-is] != i)
}

Also, it won't work for all character matrices -- it could fail for cases where there's weird stuff like " 5 6".

There is no interface to set the character matrices. The char matrices is just for display implemented in print.gglayout.

baptiste commented 12 years ago

How about testing for the number of elements against the number of elements for the maximum block?

is_rect <- function(i, mat) { is <- which(mat == i, arr.ind=T) NROW(is) >= prod(diff(apply(is, 2, range)) + c(1,1)) }

kohske commented 12 years ago

@baptiste why >=? Shoud this be ==?

baptiste commented 12 years ago

@kohske yes, == sounds right, I didn't think very much last night..

baptiste commented 12 years ago

... unless you want to use this case to specify "inset" plots, that is to say enforce connected blocks (no gap), but not necessarily simply connected (can have holes).

hadley commented 12 years ago

Moving to 0.9.2 since I'm now reserving 0.9.1 for bug fixes and v. small features.

hadley commented 12 years ago

I now wonder if this shouldn't be in it's own package.

kohske commented 12 years ago

Yes, I may write after the port to gtable package.