Closed adrfantini closed 2 years ago
Good point! geom_raster would be the first candidate to look at; I also remember that @cpsievert worked on an (rOpenSci?) package for ggplot'ting of rasters: what was the motivation for that?
We would like to keep the crs & degree axes handling integrated in geom_sf
, and seamless integration of geom_sf
and geom_stars
@hadley. We could start here by providing st_as_grob
methods as in sf/R/grid.R ?
What one usually does when plotting raster
objects with ggplot2
is turning them into data.frame
and using geom_raster
(which is what rastervis::gplot()
does). However this has some limitations for non-regular grids and, more importantly for stars
, for projected coordinates, which is something that geom_sf
handles perfectly.
Currently, I think that the best way for plotting projected gridded data with ggplot
is to transform the data to sf
polygons (e.g. with spex::polygonize()
) and using geom_sf()
. This is of course extremely slow and inefficient, but allows to visualize datasets with different projections on the same grid without interpolation, something that is very useful (to me, at least).
It's be nice if geom_stars
could do something like this in a more efficient way.
Have a look at this, grid is actually very good performance wise
I also remember that @cpsievert worked on an (rOpenSci?) package for ggplot'ting of rasters: what was the motivation for that?
https://github.com/ropensci/plotdap#readme covers the use cases pretty succiently.
Currently the raster support converts rasters to sf polygons 😬 https://github.com/ropensci/plotdap/blob/49b38ffe0a61cd93c936f0296334e46208efed4e/R/plotdap.R#L317-L319
@cpsievert Note that there are MUCH faster polygonizing methods, see #13.
I would focus on using a quadmesh version of the raster, and possibly with a conversion to triangles of that, then use ggplot. No need for raster re-projection, you can reproject the quads or triangles, and a lot less handling of redundant coordinates.
library(sf)
## create a dummy raster for plotting with ggplot2
example(st_read)
r <- fasterize::fasterize(nc,
raster::raster(raster::extent(sf::st_bbox(nc)[c(1, 3, 2, 4)]),
nrows = 300, ncols = 600,
crs = st_crs(nc)$proj4string), field = "BIR74")
## dense mesh form, every unique corner coordinate in vb, indexed by ib per rgl idioms
qm <- quadmesh::quadmesh(r, z = r)
gtab <- tibble::tibble(x = qm$vb[1, qm$ib], y = qm$vb[2, qm$ib],
quad = rep(seq_len(ncol(qm$ib)), each = 4),
BIR74 = qm$vb[3, qm$ib]) %>% dplyr::filter(!is.na(BIR74))
## transform the expanded set, this could be done before or after expansion/filtering NA
gtab[c("easting", "northing")] <-
proj4::ptransform(as.matrix(gtab[c("x", "y")]) * pi/180, sf::st_crs(nc)$proj4string,
"+init=epsg:2163")
library(ggplot2)
ggplot(gtab, aes(x = easting, y = northing, group = quad, fill = BIR74)) + geom_polygon()
The thing ggplot2 is missing is idioms to keep those indexes separate, what is essentially two tables - here one of primitives that index into the other, the corner coordinates. (The feature table is really a third table, and most map extensions to ggplot2 just handle that delayed join in different ways). I'd delay creating that long gtab table for as late as possible, since that's when the mesh expands - but many operations can be done on the coordinates or primitive indexes before that.
I'm currently still following the stars -> polygonize -> sf -> ggplot workflow, but it is extremely slow for large datasets. ggplot is not as fast as base graphics in general, but in this case we are talking orders of magnitude. Performance might be a major limitation if we intend to plot all the grid cells separately as geom_polygons
.
Now that we have a very nice st_xy2sfc
, maybe that can act as a bridge in the stars
-> sf
-> ggplot
workflow? I'm not sure if a stars
-> ggplot
direct plotting is doable; going through sf
is great but can be slow.
See vignette 3. The sf
path is taken by geom_stars
in case of a curvilinear grid; building upon the example now in the first vignette, we get
library(ggplot2)
ggplot() + geom_stars(data = prec_slice)
Oh wow, missed that. That's really, really awesome!
I'm playing around with this, look at these two plots:
library(ggplot2)
library(stars)
m = matrix(1:20, nrow = 5, ncol = 4)
x = c(0, 0.5, 1, 2, 4, 5) # 6 numbers: boundaries!
y = c(0.3, 0.5, 1, 2, 2.2) # 5 numbers: boundaries!
(r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y)))
st_crs(r) = 4326
ggplot() + geom_stars(data=r)
ggplot() + geom_stars(data=st_xy2sfc(r, as_points=FALSE))
The first one does not seem to be recognised as EPSG:4326, the second does. Is this intended behaviour?
Well, yes: you can't expect miracles to happen overnight. The first one spawns into geom_raster
or geom_rect
(the latter in this case), the last into geom_sf
. Only geom_sf
knows about CRS.
You (we) could decide to always go through geom_sf, but this will be very slow for somewhat larger rasters. Optimizing the number of pixels to plot is still out of reach, it seems, although I've brought it up with several people at the rstudio developer day in Austin, last month.
From my perspective, miracles are already happening overnight! ;) I understand, maybe this could be clarified a bit in the vignette?
I think this issue is now solved, if you prefer I can open one new issue with the above, or we can just keep it here, as you wish.
Re speed, other than downsampling, I'm not sure there's something we can do. Personally for publication plots this is not an issue, I don't mind waiting 5 or 10 minutes for a figure to be created. But yes, it can be annoying when testing and refining code.
I agree; let's leave this open until we've got the st_xy2sfc in geom_stars trick documented somewhere.
It's less about the number of pixels and more about having to specify the x and y and pixel value for each explicitly, right? rasterGrob will plot a matrix very efficiently, it's more that geom_raster needs a x, y, z table to infer the offset and delta from - but that requires a ggplot2 extension afaict - it should pass the matrix and transform through without tabular expansion. Did you discuss that aspect? I feel like that detail has never been properly aired.
I think it should be fast enough for anything that does not need to go through sf
; ggplot
plots matrices just fine, even using geom_raster
is not too slow, but it can be atrocious to plot thousands of polygons.
fast enough .. until the moment you try to plot 10 x 10 subplots with each a sentinel 2 scene having 10000 x 10000 pixels. Right now downsampling is manual; crude device size indications are needed for the case someone forgets.
I also thought rasterGrob had low level resampling? rasterImage does
Personally that's a very rare case for me (we are talking about 1e12 pixels), but of course ymmv. I understand the wanting to replicate a feature that stars::plot
already has, tho.
I guess ggplot2 uses rasterGrob
inside geom_raster
, but when doing that, how do you make sure the colors match the legend? Writing ggplot2 geoms seems like you'd have to learn a new language first.
One issue now is that downsample
is currently ignored if sf=TRUE
, as far as I can tell.
Have you tried? It should: https://github.com/r-spatial/stars/blob/master/R/tidyverse.R#L125-L126
I have:
library(stars)
library(ggplot2)
library(dplyr)
x = system.file("tif/L7_ETMs.tif", package = "stars") %>%
read_stars %>%
slice(x, 1:50) %>%
slice(y, 1:50)
gg1 = ggplot() + geom_stars(data = x, downsample=5, sf=TRUE) + facet_wrap(~band)
gg2 = ggplot() + geom_stars(data = x, downsample=5, sf=FALSE) + facet_wrap(~band)
... and it now works. I copied the commands from the previous R session, where it did not work. I am baffled, but better this way.
One thing to note re downsampling is that downsampling works in any dimension (not just x
and y
) and can be specified on a per dimension basis with an integer vector of the same lenght of the number of dimensions: e.g. c(1, 5, 1)
works for the above 3D object.
This could be clarified in the docs: for example, in the st_as_stars
docs it only mentions if length 2, specifies downsampling rate in x and y
. Maybe it could be worth to expose stars:::st_downsample
directly, and link to its documentation?
I'm curious, is the geom_stars() function supposed to work with proxy objects?
The base plot function seems to work fine but doing an equivalent ggplot call returns an error and hangs my system:
risk = read_stars(risk_tif, proxy = TRUE)
ggplot() + geom_stars(data = risk)
Error: Argument 3 must be length 84812236, not 1
Thanks! With this patch it should work if you specify a downsampling rate, as in
ggplot() + geom_stars(data = risk, downsample = 20)
to plot every 20-th row and column. As opposed to base plot, in ggplot we still don't know how many pixels need to be filled by the plot. @paleolimbot @thomasp85 @mdsumner
A potentially more user friendly approach (as of: it would not hang your system) would be to make some guess of the upper limit of pixels on the device (2000x2000?) and use that to set a lower limit, along with emitting a message or warning.
Great! I had meant to ask why downsample though? It seems asking for the window size is more useful, as in "downsample" can be defined in those terms , but the GDAL rasterio engine is clearly source-dim / output-dim in terms of interface.
From following @paleolimbot's lead in ggspatial it seems this is not easy, but I think it should be something we could ask of ggplot2 - sure, the actual device may not be instantiated, but doesn't that just mean the invalidation model is pulling data too early? Does invalidation allow for a fresh update, if the device is resized?
But, we can get dev.size("px")
and divide that up by facet columns and rows as a rough heuristic? Frankly, I don't know what the right way is - these are just thoughts of something that's puzzled me for a long time.
One issue may be that ggplot creates objects, that can be saved, after which you open a very large pdf device, and then print it. I guess the dev.size()
call would not reflect that large pdf one?
definitely, that's what I mean by invalidation - though now I'm confused about when the data-pull occurs so I might be totally off
I think downsampling early is the way to go, although specifying pixel dimension is probably better since it would be reusable between rasters of different size. I have personally never been able to get dev.size to return the correct pixel dimensions for any graphics device within ggplot2, and so I think that lazy rendering is a battle for another day.
@mdsumner it happens in geom_stars()
, and that means upon creation, not printing (plotting); after adding a print line in geom_stars()
:
> library(ggplot2)
> g = ggplot() + geom_stars(data = p, downsample = 100) + facet_wrap(~band)
[1] "downsampled!!"
# set up massive plot device here
> print(g) # too low resolution now?
# done
ah right, well that just means a standard default then, a basic heuristic of device dims and distribute that over facets? I hadn't fully grasped this before, it's helpful
Facets is the other: can we get the number of facets when we are in geom_stars()
?
If you override draw_layer()
instead of draw_panel()
or draw_group()
then yes!
That won't tell you rows or cols though (and it's probably not worth it to try to find them)
Rows/cols are not as important as the total number. Can you point us to an example where this is done?
Copy and paste this:
https://github.com/tidyverse/ggplot2/blob/master/R/geom-.r#L74-L90
...and keep your other methods as-is. You can use length(unique(data$PANEL))
within Geom$draw_layer()
to get the number of facets. You can add things to args
and they will get passed down to your other draw methods.
When using geom_stars(..., sf=FALSE)
the axis keep the same units as the data.
Instead, when using geom_sf
or geom_stars(..., sf=TRUE)
the axis and the graticule are shown in (I think) epsg:4326
, which is nice if using spatial data. However, of course, it is also much slower.
Is there a technical reasoning behind this? As a user of primarily spatial data, I would like the graticule and the axis labels to always be in degrees if the stars
object has a valid CRS.
LMK if I should open a new issue for this.
EDIT: MRE
system.file("tif/L7_ETMs.tif", package = "stars") %>% read_stars() -> x
ggplot() + geom_stars(data = x[,,,1], sf = TRUE)
ggplot() + geom_stars(data = x[,,,1])
You get an idea what is involved in doing this by studying the ggplot2 files that have -sf
in them here. ggplot2 uses its own programming language essentially, which I haven't mastered so far.
@edzer I'm afraid I gave you a more complex solution that was necessary. If you haven't discovered this already, data$PANEL
is always a factor()
, so you can examine the levels()
to get the number of panels (without overriding Geom$draw_layer()
:
library(ggplot2)
GeomPoint2 <- ggproto(
"GeomPoint2", GeomPoint,
draw_panel = function(self, data, panel_params, coord, na.rm = FALSE) {
message("There are ", length(levels(data$PANEL)), " panel(s_ in this plot")
ggproto_parent(GeomPoint, self)$draw_panel(data, panel_params, coord, na.rm)
}
)
geom_point2 <- function() {
layer(geom = GeomPoint2, stat = "identity", position = "identity")
}
p <- ggplot(mpg, aes(cty, hwy)) + geom_point2()
p
#> There are 1 panel(s_ in this plot
p + facet_wrap(vars(class))
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
#> There are 7 panel(s_ in this plot
Created on 2019-06-27 by the reprex package (v0.2.1)
Thanks, I see, but how do I get access to that number of panels inside geom_point2
?
You can't. Is there a reason you need that information at spec time?
Then we could downsample the rasters before plotting.
The "ggplot way" is to provide all the information necessary to do calculations in the constructor, then actually do the calculations as late as possible. Is there an exported downsampling function for stars objects already? I wonder if it would be cleaner to have something like sf_obj %>% downsample_to_px(500)
, then pass that object into geom_stars()
(as you've currently written it) instead of adding an extra argument to geom_stars()
.
To have this really work with ggplot, the input to geom_stars()
needs to be a data frame with a list-column of stars objects (like sfc
for rasters), and there should be some kind of constructor that can turn a dimension (like a time dimension) into rows (so that you can facet by it). Each stars object in the list should be something that can be unambiguously turned into a grob when it gets passed as the data argument to a ggplot (i.e., only x, y, and band dimensions). I think PostGIS has a data structure something like this for rasters in databases, so there's some precedent there, even if it's a bit un-stars like.
If it's possible to make some kind of sfc
for stars objects (with st_transform()
, st_bbox()
, and st_as_grob()
methods), I'd be happy to write the Geom
and Stat
to handle them.
Sounds great! We have st_transform
and st_bbox
in place.
I can see this work, although still, the st_as_grob
would like to know the dimensions of the plotting area, in pixels, so that it can for instance downsample a 10,000 x 10,000 sentinel-2 image to a 200 x 200 pixel area before assigning colors, rather than creating the full grob taking massive time and memory.
Also, for the st_as_grob
I'd need to know the color scale and the way pixel values are to be mapped into colors. Are there examples of how this looks like? Are there also functions passed on for this mapping?
I think that resampling is a separate task, and from my end it would be helpful to have this as a separate function (preferably a function that can do max_dims = c(200, 200)
or something). I might have to call this twice...once to get the data range (where I could probably do 100x100 to estimate the min/max) and once before passing to st_as_grob()
.
In my dreams, the resampling happens during st_transform()
(or maybe another function that also takes the destination CRS bounds), but I think that is a battle for another day.
For colors, it's probably best to take a (vectorized) function that take continuous values in and colours out so that you don't have to explicitly depend on ggplot2 scales. Then on my end I'll do something like this:
library(ggplot2)
scale <- scale_fill_viridis_c(limits = c(10, 20))
mapper <- scale$map
mapper(9:21)
#> [1] "grey50" "#440154" "#46286D" "#414487" "#3B5E8B" "#2A788E" "#2B9089"
#> [8] "#22A884" "#5ABC6D" "#7AD151" "#C0DD40" "#FDE725" "grey50"
...and you can have something simpler for testing.
Actually, from reading your code, it seems like min_dims
or suggested_dims
might be more appropriate than max_dims
. As long as the user can request no downsampling somehow.
st_warp
does the resampling already, it's gg's job to provide the dest
argument - the extent, crs and dimensions of the target panel - as is done here: https://github.com/paleolimbot/ggspatial/blob/master/R/layer-spatial-stars.R#L308 That could work even if source and target crs haven't changed so there's no need for special cases.
I'm confused why that part is still being discussed, am I missing something? I got close enough to understanding the gg model but I can't spend much time on it atm sadly. I really want to!
If st_as_grob()
takes a dest
template argument (and somewhere there is a function that constructs a raster template), I could certainly get on board with that! I would just prefer to have the details of raster resampling, cropping, and reprojection abstracted away from the ggplot code (and maintained by people that know what they are doing, which is to say, not me).
This might be helpful for understanding the gg model: https://github.com/paleolimbot/ggdebug#how-it-all-comes-together
Hi there, I apologize in advance for double-posting, if this is the case, but something similar has been discussed in the thread a while ago.
When I use geom_stars
, I combine it with geom_sf
of the bounding box of the stars object. This make sure that the crs is recognized, however it changes the axis-labels (which is ok for me).
Here the difference:
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(ggplot2)
system.file("tif/L7_ETMs.tif", package = "stars") %>% read_stars() -> x
ggplot() + geom_stars(data = x)
ggplot() + geom_stars(data = x) +
geom_sf(data = st_as_sfc(st_bbox(x)), col = NA, fill = NA)
Now, I tried to wrap the geom_stars
and the geom_sf
in a new geom, say geom_stars2
... but I failed horribly. Is there an easy way to wrap these two geoms? Sure, this is a ggplot task, but I (and maybe others) would only need it for this issue.
Created on 2021-07-05 by the reprex package (v1.0.0)
A while back
sf
has implemented ageom_sf
which provides aggplot2
plotting interface forsf
objects. And it is awesome.stars
imho should also provide something similar.