paleolimbot / ggspatial

Enhancing spatial visualization in ggplot2
https://paleolimbot.github.io/ggspatial
370 stars 35 forks source link

Add documentation for downsampling of rasters to speed up raster rendering #17

Open paleolimbot opened 6 years ago

paleolimbot commented 6 years ago

Just extending discussion from #9 to be its own issue.

The idea is to delay any action involving the original raster object until there is a known height, width, and target resolution, so that large rasters can be passed to annotation_spatial.Raster() without fear of taking forever to render.

Question for @mdsumner: What is the series of events that usually takes place during this operation? I had been thinking something like:

target_res <- calculate_target_res(bounds, height, width, dpi)

st_make_grid(plot_bounds, cellsize = target_res) %>% 
  st_transform(raster_crs) %>% 
  raster::extract(raster, .) %>%
  make_a_raster_grob()

...but maybe it's a bad idea to reverse project a grid from the coord_sf() CRS to the raster CRS and resample that way? The other option would be to raster::projectRaster() %>% raster::resample()? Or maybe you've solved this in the hypertidy/lazyraster package?

Would also be nice to apply this to a rasterGrob() somehow, since these also take forever to render?

mdsumner commented 6 years ago

Definitely don't use make grid, there's no need. It is outlined in lazyraster but I can't expand on this atm. I'll flesh this out some time soon, but using raster functions will give enough to prototype, see what happens when raster applies the maxpixels argument, in various places. That needs to include efficient crop top, which needs work.

paleolimbot commented 6 years ago

Cool...I will set up the Stat/Geom/grob/makeContent() such that it works, and will await your sound guidance on how to make it better.

mdsumner commented 6 years ago

Great! Apologies I can't look now, very interested in this

paleolimbot commented 6 years ago

Whenever you're back, the framework is all set. Currently not faster than a regular rasterGrob(), but would probably be if it were resampling a much larger image and used with annotation_spatial(). Current approach just uses projectRaster() with an output template based on the extent of the panel.

library(ggspatial)
#> Loading required package: ggplot2
#> Loading required package: sf
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
load_longlake_data()

system.time(
  print(
    ggplot() +
      layer_spatial(longlake_osm, lazy = FALSE) +
      layer_spatial(longlake_depthdf)
  )
)

#>    user  system elapsed 
#>   2.899   0.308   3.430

system.time(
  print(
    ggplot() +
      layer_spatial(longlake_osm, lazy = TRUE) +
      layer_spatial(longlake_depthdf)
  )
)
#> 5.48175224540989 x 4.65488578335662 in (822.262836811483 x 698.232867503493 px) dpi: 150

#>    user  system elapsed 
#>   3.372   0.507   4.104
paleolimbot commented 5 years ago

@edzer ...this is what I was talking about last night. Implemented here: https://github.com/paleolimbot/ggspatial/blob/master/R/layer-spatial-raster.R#L246 , and here: https://github.com/paleolimbot/ggspatial/blob/master/R/layer-spatial-raster.R#L219

paleolimbot commented 5 years ago

Current problem: creating an appropriate destination template for stars::st_warp() (for my own future reference, or if @edzer has a solution that will make this work)

longlake_osm <- stars::read_stars(system.file("longlake/longlake.tif", package = "ggspatial"), proxy = TRUE)
panel_params <- list(
  x_range = c(2330056.03273654, 2332693.06989841), 
  y_range = c(229111.707417625, 231625.212947026)
)

new_dim <- c(x = 500, y = 400, band = 3)

template_raster <- stars::st_as_stars(
  dimensions = stars::st_dimensions(
    x = seq(panel_params$x_range[1], panel_params$x_range[2], length.out = new_dim[1]),
    y = seq(panel_params$y_range[1], panel_params$y_range[2], length.out = new_dim[2]),
    band = seq_len(new_dim[3])
  )
)

plot(longlake_osm)

warped <- stars::st_warp(longlake_osm, dest = template_raster, options = options)
#> Error in CPL_proj_direct(as.character(c(from[1], to[1])), as.matrix(pts)): no arguments in initialization list
plot(warped)
#> Error in plot(warped): object 'warped' not found

Created on 2019-01-19 by the reprex package (v0.2.1)

edzer commented 5 years ago

Thanks! For rgb data you can do

plot(longlake_osm, rgb = 1:3, maxColorValue=255)

to plot an rgb image.

What you're trying doesn't work because template_raster doesn't have a crs; if I set it to the crs of longlake_osm the two do not overlap; that gives an error, which it shouldn't, but is probably not what you want. The following two work:

(stars::st_warp(longlake_osm, longlake_osm))
(warped <- stars::st_warp(longlake_osm, crs = st_crs(4326)))

Let me know (e.g. here or in a stars issue) if you run into things that you do want, but do not work.

paleolimbot commented 5 years ago

I'm back on this, and thanks to help by @edzer and @thomasp85 at the Tidyverse Dev Day, I'm a bit closer to having this done, but I am stuck on how to get a grob width in pixels at draw time. @thomasp85, if you have a moment, is there a way to do this? grDevices::dev.size("px") seems to always return a value that gives a dpi of 72, even though grDevices::dev.size("in") gives the correct size in inches.

I need an approximate DPI so I can downsample a potentially large raster (like is done in the stars package plot method). I could ask the user for this as a layer parameter, but it seems like there should be a way to query this?

This is a toy reprex that I think gets at what I'm trying to do:


new_custom_grob <- function() {
  grid::gTree(
    cl = "custom_grob"
  )
}

makeContext.custom_grob <- function(x) {

  width_in <- grid::convertWidth(grid::unit(1, "npc"), "in", valueOnly = TRUE)
  height_in <- grid::convertHeight(grid::unit(1, "npc"), "in", valueOnly = TRUE)

  # this DPI is not correct (`dev.size("px")`` returns incorrect value such that dpi is always 72)
  dpi <- grDevices::dev.size("px") / grDevices::dev.size("in")

  dpi_text <- sprintf(
    "%s x %s in: dpi: %sx%s", 
    width_in, height_in, dpi[1], dpi[2]
  )

  # add content to the gTree
  grid::setChildren(
    x,
    grid::gList(
      grid::textGrob(dpi_text)
    )
  )
}

grid.custom_grob <- function() {
  grid::grid.newpage()
  grid::grid.draw(new_custom_grob())
}

withr::with_png("test1.png", grid.custom_grob(), height = 300, width = 400, res = 100)
grid::grid.raster(png::readPNG("test1.png"))


withr::with_png("test2.png", grid.custom_grob(), height = 300, width = 400, res = 200)
grid::grid.newpage()
grid::grid.raster(png::readPNG("test2.png"))

Created on 2019-01-29 by the reprex package (v0.2.1)

paleolimbot commented 5 years ago

It looks like the makeContext() method gets called 3 times in the RStudio graphics device: once during the "printing", then twice. The time during the printing doesn't have the same size as the other two times, which are called from the graphics device. Other devices appear only to call makeContext() once. Caching the plot reduces the number of times the raster has to get resampled to 2 (in RStudio)...this is possible by specifying an element of the gTree as an environment.

Log output from some sleuthing:

***Checking for cached raster in x$cache[['raster_925x786']]
vapply(rlang::ctxt_stack(), function(x) as.character(x)[3], character(1))
makeContent.geom_spatial_raster_lazy(x)
makeContent(x)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(gtable)
grid.draw(gtable)
print.ggplot(ggplot() + layer_spatial(longlake_osm, lazy = TRUE) + layer_spatial(longlake_depthdf) + labs(caption = "Should have a little grey area around the sides, roughly N-S projection"))
print(ggplot() + layer_spatial(longlake_osm, lazy = TRUE) + layer_spatial(longlake_depthdf) + labs(caption = "Should have a little grey area around the sides, roughly N-S projection"))
print(ggplot() + layer_spatial(longlake_osm, lazy = TRUE) + layer_spatial(longlake_depthdf) + labs(caption = "Should have a little grey area around the sides, roughly N-S projection"))
NULL
Calling raster_grob_from_raster()

***Checking for cached raster in x$cache[['raster_670x569']]
vapply(rlang::ctxt_stack(), function(x) as.character(x)[3], character(1))
makeContent.geom_spatial_raster_lazy(x)
makeContent(x)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
doTryCatch(return(expr), name, parentenv, handler)
tryCatchOne(expr, names, parentenv, handlers[[1]])
tryCatchList(expr, classes, parentenv, handlers)
tryCatch(.Call("rs_GEcopyDisplayList", fromDevice), error = function(e) warning(e))
(function (fromDevice) 
{
    tryCatch(.Call("rs_GEcopyDisplayList", fromDevice), error = function(e) warning(e))
})(1)
NULL
Calling raster_grob_from_raster()

***Checking for cached raster in x$cache[['raster_670x569']]
vapply(rlang::ctxt_stack(), function(x) as.character(x)[3], character(1))
makeContent.geom_spatial_raster_lazy(x)
makeContent(x)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
recordGraphics(drawGTree(x), list(x = x), getNamespace("grid"))
grid.draw.gTree(x$children[[i]], recording = FALSE)
grid.draw(x$children[[i]], recording = FALSE)
drawGTree(x)
doTryCatch(return(expr), name, parentenv, handler)
tryCatchOne(expr, names, parentenv, handlers[[1]])
tryCatchList(expr, classes, parentenv, handlers)
tryCatch(.Call("rs_GEcopyDisplayList", fromDevice), error = function(e) warning(e))
(function (fromDevice) 
{
    tryCatch(.Call("rs_GEcopyDisplayList", fromDevice), error = function(e) warning(e))
})(1)
NULL
Using cached raster
paleolimbot commented 5 years ago

Cooking with fire! It turns out it is very important when resampling not to create a rasterGrob() with more pixels than is needed, since grid's rendering of the rasterGrob takes a bit.

This involves capping the maximum number of pixels in the x and y direction in the final projection space before draw time. This also makes more effective use of the draw cache, since the same size rasterGrob might be relevant as the plot size increases.

The draw-time resampling (for rgb, raster package) is now faster than the build-time resampling!

library(ggspatial)
#> Loading required package: ggplot2
load_longlake_data(which = c("longlake_osm", "longlake_depthdf"), raster_format = "raster")

system.time(
  print(
    ggplot() +
      layer_spatial(longlake_osm, lazy = FALSE) +
      layer_spatial(longlake_depthdf)
  )
)

#>    user  system elapsed 
#>   1.914   0.328   2.477

system.time(
  print(
    ggplot() +
      layer_spatial(longlake_osm, lazy = TRUE) +
      layer_spatial(longlake_depthdf)
  )
)

#>    user  system elapsed 
#>   1.717   0.301   2.250

Created on 2019-02-03 by the reprex package (v0.2.1)

paleolimbot commented 5 years ago

For when I start in on this next weekend, this is what needs to happen in order to lazily map aesthetics at the draw stage. It is...madness, and not technically currently possible (ggplot2 would need to allow access to the ScalesList from the layout, which is not unreasonable, but right now I'm scanning the call stack for the plot object, which is a crazy hack).

library(ggplot2)
library(grid)

# in real life, this would be a reference to a large (scary) raster file
big_scary_raster <- tibble::tibble(raster = list(matrix(1:9, nrow = 3)))

StatMatrixList <- ggproto(
  "StatMatrixList",
  Stat,

  required_aes = "raster",
  default_aes = ggplot2::aes(fill = stat(z)),

  compute_layer = function(self, data, params, layout) {
    data$raster <- lapply(data$raster, function(x) {
      df <- reshape2::melt(x)
      names(df) <- c("x", "y", "z")
      df
    })

    tidyr::unnest(data, .data$raster)
  }
)

ggplot(big_scary_raster, aes(raster = raster)) +
  geom_raster(stat = StatMatrixList, hjust = 0, vjust = 0)


StatLazyMatrixList <- ggproto(
  "StatLazyMatrixList",
  StatMatrixList,

  compute_layer = function(self, data, params, layout) {
    # only return limits in the stat (these are usually cached in the raster file,
    # so the raster doesn't need to be loaded). Scales get trained based on the
    # result of this function.
    data$limits <- lapply(data$raster, function(raster) {
      tibble::tibble(
        x = c(0, ncol(raster)),
        y = c(0, nrow(raster)),
        z = range(raster)
      )
    })

    tidyr::unnest(data, .data$limits, .drop = FALSE)
  }
)

GeomLazyRaster <- ggproto(
  "GeomLazyRaster",
  Geom,
  required_aesthetics = "raster",

  default_aes = ggplot2::aes(alpha = "__default_alpha__", fill = "__default_fill__"),

  handle_na = function(data, params) {
    data
  },

  draw_panel = function(data, panel_params, coordinates) {
    # this is a super crazy hack
    # but there is no other way to scale objects from the draw function at build time (?)
    scales <- NULL
    for(i in 1:20) {
      env <- parent.frame(i)
      if("plot" %in% names(env) && "scales" %in% names(env$plot) && inherits(env$plot$scales, "ScalesList")) {
        scales <- env$plot$scales
        break
      }
    }
    if(is.null(scales)) stop("@paleolimbot's hack to get the ScalesList from Geom$draw_panel() has failed :'(")
    fill_scale <- scales$get_scales("fill")
    alpha_scale <- scales$get_scales("alpha")

    if(all(data$alpha == "__default_alpha__")) {
      # default
      alpha <- function(x) 1
    } else if(length(unique(data$alpha)) == 1) {
      # set (or mapped but constant)
      alpha <- function(x) unique(data$alpha)
    } else if(!is.null(alpha_scale)) {
      # mapped
      alpha <- alpha_scale$map
    } else {
      stop("Don't know how to compute 'alpha'")
    }

    if(all(data$fill == "__default_fill__")) {
      # default
      fill <- function(x) 1
    } else if(length(unique(data$fill)) == 1) {
      # set (or mapped but constant)
      fill <- function(x) unique(data$fill)
    } else if(!is.null(fill_scale)) {
      # mapped
      fill <- fill_scale$map
    } else {
      stop("Don't know how to compute 'fill'")
    }

    gTree(
      raster = data$raster[[1]],
      fill = fill,
      alpha = alpha,
      coordinates = coordinates,
      panel_params = panel_params,
      cl = "lazy_raster_grob"
    )
  }
)

geom_lazy_raster <- function(mapping = NULL, data = NULL, stat = StatLazyMatrixList,
                             ..., inherit.aes = TRUE, show.legend = NA) {
  layer(
    geom = GeomLazyRaster,
    stat = stat,
    data = data,
    mapping = mapping,
    position = "identity",
    params = list(...),
    inherit.aes = inherit.aes,
    show.legend = show.legend
  )
}

makeContext.lazy_raster_grob <- function(x) {
  # here it's possible to determine height and width in inches
  # getting DPI from the graphics device may not be possible,
  # but can always fall back on a user-specified minimum

  # projection + resampling would happen here

  # apply the aesthetics
  colors <- x$fill(x$raster)
  alpha <- x$alpha(x$raster)
  colors <- paste0(colors, as.character.hexmode(scales::rescale(alpha, from = c(0, 1), to = c(0, 255))))
  dim(colors) <- dim(x$raster)

  # map the coordinates
  corners <- data.frame(x = c(0, ncol(x$raster)), y = c(0, nrow(x$raster)))
  corners_trans <- x$coordinates$transform(corners, x$panel_params)
  x_rng <- range(corners_trans$x, na.rm = TRUE)
  y_rng <- range(corners_trans$y, na.rm = TRUE)

  setChildren(x, gList(rasterGrob(
    # there is an axis irregularity between what we think of as rows
    # and what grid thinks of as rows
    aperm(colors, c(2, 1))[nrow(colors):1,],
    x = x_rng[1], y = y_rng[1],
    height = diff(y_rng), width = diff(x_rng),
    default.units = "native",
    interpolate = FALSE,
    hjust = 0,
    vjust = 0
  )))
}

ggplot(big_scary_raster, aes(raster = raster)) +
  geom_lazy_raster()

Created on 2019-02-03 by the reprex package (v0.2.1)

mdsumner commented 5 years ago

I have finally got my head around this and have made analogous layer_spatial objects for lazyraster and silicate SC0, thanks for laying the trail I now see how this gg stuff works!

I'll come back around with some more details on this issue and my own explorations. The key thing that I've never articulated well before is that warping a new raster is fine as an approach (and is now rippingly efficient via sf/stars), but I've also wanted the ability to treat rasters as a mesh, with both resampling and reprojection available without re-forming the grid, and without turning pixels into polygons en masse.

I have lazyraster and silicate branches, not intended as PRs just parking what I've learnt.

paleolimbot commented 5 years ago

This is very cool...I'd like to get back to working on the "rasters in ggplot2" problem soon. I think it could be simplified by each raster implementation having its own method for (say) make_into_a_grob(x, target_crs, target_extent, fill_scale, height = NA, width = NA) (much like *sf has st_as_grob()). I think this would let you treat things as a mesh since you could return a polygonGrob() rather than a rasterGrob(). I suppose I'd also need st_bbox() to be implemented for the raster class so that the position scales get trained properly.

edzer commented 5 years ago

Will you again be at the tidyverse dev day in July, @paleolimbot ?

paleolimbot commented 5 years ago

Unfortunately I won't! But I will be prepping a lot of ggplot2 issues for the event, and possibly reviewing and merging all the minor PRs. Currently I'm trying to make headway on customizing position guides, but after that I should have some time to focus on the spatial bits.

paleolimbot commented 4 years ago

This issue will have to be on hold until tidyverse/ggplot2#3385 and tidyverse/ggplot2#3116 are resolved (until they are, lazy rendering is going to be slow and will use some hack that will break in future ggplot2 versions). Since this won't happen for the forthcoming Jan/Feb ggplot2 release (3.3.0), I'm going to remove lazy rendering from ggspatial until it can be done properly. In lieu of lazy rendering, I'll add some documentation on downsampling before piping to ggplot2.

mdsumner commented 1 year ago

I forgot about all this great info, I've got the warper under control now and will revisit. gg needs to expose this as a basic facility, extent, dimension, and crs (when possible) so a source can just be queried directly 🙏

Also fwiw we'll be able to avoid some churn from native imagery to a colour matrix with https://github.com/OSGeo/gdal/pull/7020