Feature request: allow access to the trained scales from the `layout` object

paleolimbot commented 5 years ago

This is a fairly easy (I think) feature to add that would help me implement lazy reading of huge raster files (a stars proxy object), such that they can be downsampled appropriately. To do this, I would need access to the trained scales (for the fill and alpha aesthetics at draw time. Right now there is no way to access the trained scales except limited information about them in the panel_params and coordinates objects. I think that a reference to the ScalesList object could be included as an element of the layout object without any consequences? The Geom has access to the layout at draw time.

It is a completely valid point that this would be pushing ggplot2 where it perhaps should not go, coercing it to do things that it was not meant to do. I would argue that it is necessary if spatial rasters are ever going to be plotted in ggplot by those of us who don't have an intimate knowledge of downsampling (and maybe they shouldn't be). There is a brief discussion on the implementation of this at paleolimbot/ggspatial#17 . A reprex is included below.

library(ggplot2)
library(grid)

# in real life, this would be a reference to a large (scary) raster file
big_scary_raster <- tibble::tibble(raster = list(matrix(1:9, nrow = 3)))


StatMatrixList <- ggproto(
  "StatMatrixList",
  Stat,

  required_aes = "raster",
  default_aes = ggplot2::aes(fill = stat(z)),

  compute_layer = function(self, data, params, layout) {
    data$raster <- lapply(data$raster, function(x) {
      df <- reshape2::melt(x)
      names(df) <- c("x", "y", "z")
      df
    })

    tidyr::unnest(data, .data$raster)
  }
)

ggplot(big_scary_raster, aes(raster = raster)) +
  geom_raster(stat = StatMatrixList, hjust = 0, vjust = 0)


StatLazyMatrixList <- ggproto(
  "StatLazyMatrixList",
  StatMatrixList,

  compute_layer = function(self, data, params, layout) {
    # only return limits in the stat (these are usually cached in the raster file,
    # so the raster doesn't need to be loaded). Scales get trained based on the
    # result of this function.
    data$limits <- lapply(data$raster, function(raster) {
      tibble::tibble(
        x = c(0, ncol(raster)),
        y = c(0, nrow(raster)),
        z = range(raster)
      )
    })

    tidyr::unnest(data, .data$limits, .drop = FALSE)
  }
)

GeomLazyRaster <- ggproto(
  "GeomLazyRaster",
  Geom,
  required_aesthetics = "raster",

  default_aes = ggplot2::aes(alpha = "__default_alpha__", fill = "__default_fill__"),

  handle_na = function(data, params) {
    data
  },

  draw_panel = function(data, panel_params, coordinates) {
    # this is a super crazy hack
    # but there is no other way to get scale objects from the draw function at build time (?)
    scales <- NULL
    for(i in 1:20) {
      env <- parent.frame(i)
      if("plot" %in% names(env) && "scales" %in% names(env$plot) && inherits(env$plot$scales, "ScalesList")) {
        scales <- env$plot$scales
        break
      }
    }
    if(is.null(scales)) stop("@paleolimbot's hack to get the ScalesList from Geom$draw_panel() has failed :'(")
    fill_scale <- scales$get_scales("fill")
    alpha_scale <- scales$get_scales("alpha")

    if(all(data$alpha == "__default_alpha__")) {
      # default
      alpha <- function(x) 1
    } else if(length(unique(data$alpha)) == 1) {
      # set (or mapped but constant)
      alpha <- function(x) unique(data$alpha)
    } else if(!is.null(alpha_scale)) {
      # mapped
      alpha <- alpha_scale$map
    } else {
      stop("Don't know how to compute 'alpha'")
    }

    if(all(data$fill == "__default_fill__")) {
      # default
      fill <- function(x) 1
    } else if(length(unique(data$fill)) == 1) {
      # set (or mapped but constant)
      fill <- function(x) unique(data$fill)
    } else if(!is.null(fill_scale)) {
      # mapped
      fill <- fill_scale$map
    } else {
      stop("Don't know how to compute 'fill'")
    }

    gTree(
      raster = data$raster[[1]],
      fill = fill,
      alpha = alpha,
      coordinates = coordinates,
      panel_params = panel_params,
      cl = "lazy_raster_grob"
    )
  }
)

geom_lazy_raster <- function(mapping = NULL, data = NULL, stat = StatLazyMatrixList,
                             ..., inherit.aes = TRUE, show.legend = NA) {
  layer(
    geom = GeomLazyRaster,
    stat = stat,
    data = data,
    mapping = mapping,
    position = "identity",
    params = list(...),
    inherit.aes = inherit.aes,
    show.legend = show.legend
  )
}

makeContext.lazy_raster_grob <- function(x) {
  # here it's possible to determine height and width in inches
  # getting DPI from the graphics device may not be possible,
  # but can always fall back on a user-specified minimum

  # projection + resampling would happen here

  # apply the aesthetics
  colors <- x$fill(x$raster)
  alpha <- x$alpha(x$raster)
  colors <- paste0(colors, as.character.hexmode(scales::rescale(alpha, from = c(0, 1), to = c(0, 255))))
  dim(colors) <- dim(x$raster)

  # map the coordinates
  corners <- data.frame(x = c(0, ncol(x$raster)), y = c(0, nrow(x$raster)))
  corners_trans <- x$coordinates$transform(corners, x$panel_params)
  x_rng <- range(corners_trans$x, na.rm = TRUE)
  y_rng <- range(corners_trans$y, na.rm = TRUE)

  setChildren(x, gList(rasterGrob(
    # there is an axis irregularity between what we think of as rows
    # and what grid thinks of as rows
    aperm(colors, c(2, 1))[nrow(colors):1,],
    x = x_rng[1], y = y_rng[1],
    height = diff(y_rng), width = diff(x_rng),
    default.units = "native",
    interpolate = FALSE,
    hjust = 0,
    vjust = 0
  )))
}

ggplot(big_scary_raster, aes(raster = raster)) +
  geom_lazy_raster()

^{Created on 2019-02-03 by the reprex package (v0.2.1)}

clauswilke commented 5 years ago

I think this feature would also help with generating filled contours at draw time from the new binning scale @thomasp85 is implementing. See #3044 and #3096.

thomasp85 commented 5 years ago

There are definitely enough meaningful examples to make us consider how to support it. I don’t personally think we should make the ScalesList object available as it is not currently an exported class we have committed to keeping stable. Making it available directly would limit our possibilities in the future all for the sake of a few (albeit important) needs...

thomasp85 commented 5 years ago

An approach I would like would be some sort of method in the geom for acquiring specific scales - eg self$get_scale("fill")

paleolimbot commented 5 years ago

How about as a method of the layout (i.e. layout$get_scale("fill"))? I don't think the geoms ever carry any stateful info, state is passed through in the layout or panel_params. I could work up a PR implementing layout$get_scale("fill") because I think it would be minor...I would be happy to call Geom$get_scale() but I don't know how to make it happen.

clauswilke commented 5 years ago

I think whatever solution we come up with requires careful thought so we don't box ourselves in for future development. We've talked a lot about layer-specific scales (e.g., different color scales for different layers), and if at all possible the approach taken now should be future-proof for that. (Even if we don't really know yet how to set up layer-specific scales.)

This reasoning would suggest that it would be best if the scales could be made available through the Layer object, or directly through the Geom. I'm not immediately sure how to do this, but I think pondering a little longer is the right strategy.

paleolimbot commented 5 years ago

That's a great point...thanks to you both for considering this and let me know if there is anything I can do!

clauswilke commented 5 years ago

If you want to dig around some more in the ggplot2 code to develop potential strategies of how we can get scales info into the geom that would be helpful. Ideally, before we start coding, we'd have a list of potential approaches, each with pros and cons, and then we can make an informed decision about how to proceed.

paleolimbot commented 5 years ago

👍 will do!

paleolimbot commented 5 years ago

I tried a few things! I'm sure there are other ways to go about this...

Make the plot/Layer/Geom stack "stateful" (i.e., instances contain backreferences to the parent object, and the Layer additionally contains a reference to the Layout). This allows Layer$get_scale(which, panel), or Geom$layer$get_scale(which, panel). Of course, this opens up a lot of the internals to possible abuse by subclasses because they would have access to the whole object through the backreferences. I've implemented a toy version of this here: https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:stateful-layer

library(ggplot2)

GeomScaleInfo <- ggproto(
  "GeomScaleInfo", Geom,
  required_aes = "x",
  draw_panel = function(self, data, panel_params, coord) {

    x_scale <- self$layer$get_scale("x", 1L)
    col_scale <- self$layer$get_scale("colour")

    x_range <- x_scale$range$range
    col_limits <- col_scale$range$range

    if(is.null(x_range)) {
      text <- "NULL x-range"
    } else {
      text <- sprintf(
        "X-scale: %s-%s\ncol-scale: %s items", 
        x_range[1], x_range[2], length(col_limits)
      )
    }

    grid::textGrob(text, x = 1, y = 1, hjust = 1.1, vjust = 1.5)
  }
)

geom_scale_info <- function() {
  layer(
    geom = GeomScaleInfo, stat = "identity", data = data.frame(x = 1), mapping = aes(x = x), 
    position = "identity",
    params = list(), inherit.aes = FALSE, show.legend = NA
  )
}

ggplot(mpg, aes(displ, hwy, col = class)) + geom_point() + geom_scale_info()

^{Created on 2019-02-13 by the reprex package (v0.2.1)}

Make the Layout object have access to the non-position scales, in addition to the position scales (currently it contains the only valid reference to the trained position scales, but the non-position scales are in the original ScalesList, which also contains some confusing references to the untrained original position scales). This is more in the spirit of the current implementation, but definitely precludes anything that requires a reference to the layer (because even the Geom doesn't know anything about the layer in the current code). A toy version of that is here: https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:smarter-layout

library(ggplot2)

GeomScaleInfo <- ggproto(
  "GeomScaleInfo", Geom,
  required_aes = "x",

  # this approach requires overriding draw_layer() to pass on the layout object
  draw_layer = function(self, data, params, layout, coord) {
    if (ggplot2:::empty(data)) {
      n <- if (is.factor(data$PANEL)) nlevels(data$PANEL) else 1L
      return(rep(list(zeroGrob()), n))
    }

    # Trim off extra parameters
    params <- params[intersect(names(params), self$parameters())]

    args <- c(list(quote(data), quote(panel_params), quote(coord), quote(layout)), params)
    lapply(split(data, data$PANEL), function(data) {
      if (empty(data)) return(zeroGrob())

      panel_params <- layout$panel_params[[data$PANEL[1]]]
      do.call(self$draw_panel, args)
    })
  },

  draw_panel = function(self, data, panel_params, coord, layout) {

    x_scale <- layout$get_scale("x", 1L)
    col_scale <- layout$get_scale("colour")

    x_range <- x_scale$range$range
    col_limits <- col_scale$range$range

    if(is.null(x_range)) {
      text <- "NULL x-range"
    } else {
      text <- sprintf(
        "X-scale: %s-%s\ncol-scale: %s items", 
        x_range[1], x_range[2], length(col_limits)
      )
    }

    grid::textGrob(text, x = 1, y = 1, hjust = 1.1, vjust = 1.5)
  }
)

geom_scale_info <- function() {
  layer(
    geom = GeomScaleInfo, stat = "identity", data = data.frame(x = 1), mapping = aes(x = x), 
    position = "identity",
    params = list(), inherit.aes = FALSE, show.legend = NA
  )
}

ggplot(mpg, aes(displ, hwy, col = class)) + geom_point() + geom_scale_info()

^{Created on 2019-02-13 by the reprex package (v0.2.1)}

clauswilke commented 5 years ago

The first option seems nicer and more general to me. Not sure if there are any major downsides we're overlooking, though. Usually, a main design principle of software engineering is to avoid circular dependencies, and it seems this introduces one, since layer knows of geom and geom will know of layer after the change.

thomasp85 commented 5 years ago

A general rule for most ggproto objects is that they should be stateless factories, so I would prefer a setup that keeps this

paleolimbot commented 5 years ago

Layer info could be included in the data (similar to how the data contains a PANEL column, it could contain a LAYER column). Then one could get a layer-specific scale using something like Layout$get_scale("x", data$PANEL[1], data$LAYER[1]) from the Geom draw methods. Without making the whole stack stateful, I can't think of a way to get Geom$get_scale() to work.

clauswilke commented 5 years ago

There's an additional complication: if we change the arguments of draw_layer() etc. then all the third-party geoms that have been released will break.

Is it possible to attach this information to either panel_params or coord? It doesn't belong into data.

paleolimbot commented 5 years ago

I don't think the arguments would need to be changed to implement Layout$get_scale("x", data$PANEL[1], data$LAYER[1])...the rare geometries that need to access the Layout would just need to override Geom$draw_layer() rather than Geom$draw_panel() or Geom$draw_group().

There's probably some other ways to get the layer ID into the draw method...the data seems like the least breaking way to do that, but it could also be set as a property of the Layer or Geom at build time.

thomasp85 commented 5 years ago

“Just” overwriting draw_layer is not that simple as it needs to correctly split up and assemble everything efficiently. Pretty sure we could do this by adding a getter function to the param object we pass into the draw method

clauswilke commented 5 years ago

There's one other strategy for handing down information that we haven't considered yet: We can add new member functions that provide a geom or layer with whatever information it needs later, so that it can store it. This is less elegant than expanding the number of arguments in draw_layer()/draw_panel() but it has the advantage that it will work automatically with all existing extension packages.

thomasp85 commented 5 years ago

Wouldn’t that still make the geoms stateful?

paleolimbot commented 5 years ago

Yes, that's how I implemented the stateful version of the Plot/Layer/Geom stack (a build_init(plot, layout) method that gets called from build_plot()). https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:stateful-layer

clauswilke commented 5 years ago

@thomasp85 Yes, you're right, that won't work.

I'm coming around to @paleolimbot's perspective that Layout may be the right place for this info. The panel_params object that is provided to draw_layer() etc. is probably misnamed, because it holds the geoms' own parameters. A better name would be geom_params, which immediately makes clear that this is not a place for the ggplot2 infrastructure to add generic items. The Layout, on the other hand, already holds all sorts of layer-specific info, such as the coordinate system, facets, position scales, etc: https://github.com/tidyverse/ggplot2/blob/26cd1075006da8f12f30ace7fbf9883c595a38e3/R/layout.R#L16-L31 It could hold information about other scales as well.

I also think that requiring geoms to overwrite draw_layer() to make use of this is fine. @paleolimbot wrote the key points:

the rare geometries that need to access the Layout would just need to override Geom$draw_layer() rather than Geom$draw_panel() or Geom$draw_group().

Most geoms wouldn't need this info and thus wouldn't have to change in any way. Any geom that needs this info would have to reimplement draw_layer(), and that would include properly splitting things up and assembling them effectively. I think anybody who wants to write a special geom that can interact in a smart way with scales can also be expected to be able to reimplement draw_layer() correctly.

The alternative would be to add a new argument to draw_layer() / draw_panel() etc. that is layer-specific, e.g. layer_params. This would probably be the best change in terms of preparing us for future developments, but it would also break all existing extension packages.

paleolimbot commented 5 years ago

It could be that layer_params gets passed to the draw_layer() method, which is more rarely re-implemented than draw_panel() or draw_group().

clauswilke commented 5 years ago

Ah, maybe that's a workable idea. I'm still worried about the Layout solution because the layer parameters could in principle be different for different layers within the same panel, but Layout is the same for all layers.

clauswilke commented 5 years ago

Is there an easy way to find out how many extension packages have redefined draw_layer()?

paleolimbot commented 5 years ago

Nothing that causes any tests to fail...I'll run a revdepcheck as well.

paleolimbot commented 5 years ago

I forgot that revdepcheck for ggplot2 incapacitates my computer for a day...based on the preliminary results it seems unlikely it's common to override draw_layer() (no new errors related to changing the signature). Also, googling the signature of draw_layer vs googling the signature of draw_panel and draw_group seems to indicate that nobody has ever done it in a place that is google-indexed.

paleolimbot commented 5 years ago

How about passing the layer through the draw_layer() method? I can't think of anything to put in a layer_params object other than a reference to the layer and maybe the layer index.

https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:smarter-layout

library(ggplot2)

GeomScaleInfo <- ggproto(
  "GeomScaleInfo", Geom,
  required_aes = "x",

  # this approach requires overriding draw_layer() to pass on the layout and layer objects
  draw_layer = function(self, data, params, layout, coord, layer) {
    if (ggplot2:::empty(data)) {
      n <- if (is.factor(data$PANEL)) nlevels(data$PANEL) else 1L
      return(rep(list(zeroGrob()), n))
    }

    # Trim off extra parameters
    params <- params[intersect(names(params), self$parameters())]

    args <- c(list(quote(data), quote(panel_params), quote(coord), quote(layout), quote(layer)), params)
    lapply(split(data, data$PANEL), function(data) {
      if (empty(data)) return(zeroGrob())

      panel_params <- layout$panel_params[[data$PANEL[1]]]
      do.call(self$draw_panel, args)
    })
  },

  draw_panel = function(self, data, panel_params, coord, layout, layer) {

    x_scale <- layout$get_scale("x", data$PANEL[1], layer)
    col_scale <- layout$get_scale("colour", data$PANEL[1], layer)

    x_range <- x_scale$range$range
    col_limits <- col_scale$range$range

    text <- sprintf(
      "X-scale: %s-%s\ncol-scale: %s items", 
      x_range[1], x_range[2], length(col_limits)
    )

    grid::textGrob(text, x = 1, y = 1, hjust = 1.1, vjust = 1.5)
  }
)

geom_scale_info <- function() {
  layer(
    geom = GeomScaleInfo, stat = "identity", data = data.frame(x = 1), mapping = aes(x = x), 
    position = "identity",
    params = list(), inherit.aes = FALSE, show.legend = NA
  )
}

ggplot(mpg, aes(displ, hwy, col = class)) + geom_point() + geom_scale_info()

^{Created on 2019-02-19 by the reprex package (v0.2.1)}

thomasp85 commented 5 years ago

I've lost track of what you want with layer? wasn't the intend to access scales within the draw_* methods, which you have handled by adding the get_scale method to layout

paleolimbot commented 5 years ago

I don't need the layer at all, unless the scale depends on it. @clauswilke seemed to indicate that layer-specific scales might occur in the future. All I need to implement lazy rasters is layout$get_scale("fill").

clauswilke commented 5 years ago

My concern was that get_scale() may not belong into the layout, since the layout doesn't know about layers. The best way we've so far come up with to implement layer-specific scales is to trick the geom into thinking it works with one aesthetic when it actually works with another. Something like layout$get_scale("fill") would circumvent that, because the layer wouldn't be able to intercept the call and know that fill here really should be fill2.

paleolimbot commented 5 years ago

There are good reasons for this in the raster context as well...I circumvent aesthetics/scales for RGB(A) rasters, but I am essentially mapping band1 to red, band2 to green, band3 to blue, and band4 to alpha, and those scales are not relevant to other layers.

I can also think of some uses for scales that depend on the panel. The position scales are currently implemented this way, and I could see how it may make sense for another aesthetic to have free scales as well with multiple panels, even if that's a long way off.

So, I think layout$get_scale("fill", panel, layer) is reasonable, even if the layer is ignored until layer-specific scales are figured out, and panel is ignored for non-position scales.

I could also see how working it into panel_params is reasonable (I think one of you suggested this already), as it currently contains the limits for the position scales (for each panel, for each layer), and is already passed to the draw method.

thomasp85 commented 5 years ago

I may have suggested it above at some point, but I don’t really think it belongs in something related to panels. Honestly I also don’t think it belong in Layout as the purpose of that object is orchestrating the positioning of stuff (mainly). It might pragmatically be the best place to put it, but I’m not happy about it 🙂

clauswilke commented 5 years ago

It needs to go through the layer. I'm certain about it. The question is just how to do it cleanly.

paleolimbot commented 5 years ago

The Layout contains references to the x and y scales, even if the scales are the same object for multiple panels. Is the fill scale not a scale that refers to the same object for multiple panels?

thomasp85 commented 5 years ago

Positional scales are special

thomasp85 commented 5 years ago

I really don't like passing in the layer object as it makes it practically a public class, and people could begin doing all sorts of stuff that we would later break... If we are considering adding arguments to draw_panels() why not just pass in a get_scale() function? It could be layer aware and all that...

clauswilke commented 5 years ago

Thomas, I agree with your concerns about passing in layer, and I don't think we should do so. However, I also don't think we should just pass in a raw function, because that's not very extensible. I propose that we pass in a list layer_params, and the list can then contain the get_scale() function (and maybe the number of the layer). This keeps things extensible. If we need to have access to other layer properties in the future, we can just add them to the layer_params.

paleolimbot commented 5 years ago

I can get on board with that. The scale depends on the aesthetic, the layer, and the panel...layers don't know about panels, and panels don't know about layers, so I get how it's awkward any way it goes.

I like how layer_params looks in Geom$draw_layer(), although without the Layer knowing what index it is (keeping it stateless), without the Layout knowing anything about layers, and without the Layout knowing anything about non-position scales, implementing it is awkward and requires adding another item to the built plot (maybe there are other ways to do this that are less awkward).

https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:layer-params

clauswilke commented 5 years ago

I'm not sure Layer needs to remain stateless. By my understanding, it already carries state, namely the geom, the stat, and their parameters. See also here: https://github.com/tidyverse/ggplot2/blob/033fb521a084447c15178c591ab96a158f3b1e1e/R/layer.r#L335

Conceivably this line could read:

self$geom$draw_layer(data, self$geom_params, layout, layout$coord, self$layer_params)

and you would just need a hook that generates the layer params for each layer when the plot is being built.

paleolimbot commented 5 years ago

It's much cleaner with the layout_params living in the Layer, although Thomas is right that none of the other ggproto objects work that way. I don't know enough about the history to have any opinion either way, I trust you both on that front.

https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:layer-layer-params

clauswilke commented 5 years ago

I'll let Thomas speak for himself, but I think the main difference is that layers are explicitly created as stateful instances: https://github.com/tidyverse/ggplot2/blob/033fb521a084447c15178c591ab96a158f3b1e1e/R/layer.r#L135-L146

By contrast, geoms are not, and they are shared among all layers. You'd have to completely rewrite how geoms are created to make them stateful. By contrast, for layers it's already designed that way.

ggproto itself doesn't enforce one mechanism or the other, both are possible.

paleolimbot commented 5 years ago

That's a good point. Sounds good to me...are there other things that should get solved before a PR? Should I do a PR?

clauswilke commented 5 years ago

I think we should get @hadley's Ok on the proposed change.

hadley commented 5 years ago

Could someone summarise the discussion for me please?

clauswilke commented 5 years ago

@hadley: We (@paleolimbot, @thomasp85, @clauswilke) all think that there is a need for some geoms to have access to trained scales, so the geoms can do certain data transformations at draw time. One example is downsampling of larger raster objects at draw time. Another example is turning raster images into contours at draw time. The main question is how to implement this so it is the least disruptive to existing code and the most general for future development.

We have settled on extending the argument list of the draw_layer() function of Geom. It appears that currently no published extension package overrides this function, so this is a safe modification. (@paleolimbot ran a revdep check.) The question then is what to hand to the draw_layer() function. We have decided on handing it a layer_params object that currently will hold the index of the layer and a function that has access to the scales. In the future, this object could hold other layer-specific parameters as well.

The total required changes in the ggplot2 code base can be seen here: https://github.com/tidyverse/ggplot2/compare/master...paleolimbot:layer-layer-params They are rather limited. The main addition is a build_init() function to Layer. This function is called during the build process to set up the layer_params object for each layer.

The main question that is remaining from my perspective is whether build_init() is named correctly (maybe it should be called setup_layer_params()) and when in the build process it should be called.

hadley commented 5 years ago

Thanks for the summary. Stream of consciousness from reading your summary + the code:

The overall motivation seems reasonable to me. But I wonder if we could maybe start simpler and just expose the x and y scales? That would allow @paleolimbot to solve the motivating problem, and seems reasonable given that it's only the x and y scales that might vary on a per-panel basis.
Should we add ... to draw_layer() so that it becomes possible to extend further in the future?
The layer_params object feels weird to me. Why is this the only place we need it? Why it is a list and not a proper (R6? ggproto?) object? Why would the layer need to know what its index is? (And what is the index? Just the order that it's drawn in? Why is that important?) Is it really returning parameters for a layer when in many cases you'll need to provide the panel id too?
I have no feedback on the names - I don't remember how this code works well enough.

Overall, the API doesn't feel right to me, but I can't suggest how to make it better because I don't remember the constraints here. It's possible that this is already the least-worst API, and if you feel strongly that it's correct, you shouldn't let my opinion block you.

paleolimbot commented 5 years ago

Thanks @hadley! A brief clarification on the motivating problem...the X and Y scales are available at draw time in the layout that gets passed to the draw_layer() method, but to lazily load rasters, I need access to the trained non-position scales (fill and alpha) at draw time, which are not available in the layout object.

There was concern that non-position scales do not belong in the layout, and that at some point in the future, scales may depend on the layer in addition to the panel ID. This is why layers are involved in the latest iteration.

clauswilke commented 5 years ago

A couple of more thoughts from me, in no particular order:

The index is probably not needed. It was required in an earlier version of the code.
layer_params was modeled after the various other *_params objects used in ggplot2, which all generally are lists (e.g. geom_params, stat_params, panel_params, etc). However, here, layer_params doesn't actually hold parameters, so maybe that's a misnomer.
We could instead hand in a get_scale function, as discussed in an earlier iteration, and add the ... argument for future use if needed.
Hadley, you asked about why this is the only place where we need layer_params. Actually, we would need it in other draw functions also, such as draw_group() etc., but that would break all existing extension packages, so we decided to not hand it down in the default geom and instead led extension developers handle this how they see fit.

Dewey, if we go the route of get_scale() function + ..., you could probably store the get_scale() function in the layer in the same way you currently store the layer params, so you wouldn't have to go back to the earlier awkward code that required extensive indexing to feed things through.

yutannihilation commented 5 years ago

Can we first implement the per-layer scale that belong to Layer, and then let the users to choose whether the scale is trained with the whole data or the layer data? If it is trained with the whole data, the per-layer scale would be the same as the plot scale.

hadley commented 5 years ago

It sounds like you've considered all the issues I've bought up, so I'm happy to return the reins back to y'all. (But feel free to ping me if you want more advice, regardless of my limit memory of this code)

paleolimbot commented 5 years ago

@clauswilke : Passing just a get_scale() function into draw_layer() seems thin to me, although I suppose a layer_params that only has a get_scale() function is about the same. I think that @yutannihilation is right that solving Layer-specific scales would be helpful to this problem.

In particular, it may better define the role of the Layer to the point where it could get passed to the draw_layer() method? I'm not sure this means it has to be "public"...it would be one level higher than anybody has ever overridden, plus the layout is exposed to the draw method as well, and has a similar level of definition as the layer (I think). For my use case, this would allow something like hillshades or RGB(A) to be Stats that are computed at draw time rather than Geoms that are basically a stat + a geom.

Alternatively, define the layer_params as everything that is computed by the Layer + plot + layout but not stored there (aesthetics, initial data, final data, scales) and make that a well-defined public class?

clauswilke commented 5 years ago

I don't think we should export Layer at this point, because it may require substantial revisions in the future. Thus, we can't hand the layer to the geom. (And also, it wouldn't be a clean design, since it creates a circular dependency.)

Adding more things to layer_params, in case they may be useful in the future, is fine by me. Some of that information could be captured in this function: https://github.com/tidyverse/ggplot2/blob/033fb521a084447c15178c591ab96a158f3b1e1e/R/layer.r#L204-L206

I'm not sure of the performance implications if every layer stores the raw data, though. Are these all just references that don't cost anything?

tidyverse / ggplot2

Feature request: allow access to the trained scales from the `layout` object #3116