Open thomasp85 opened 6 years ago
Sure
@edzer before I begin poking at this: Was there technical reason for putting the transform this early or did it just seem right?
@thomasp85 I was not involved in where this happens, and know too little about ggplot2 internals to say something intelligible about it.
This may have implications for setup_panel_params()
and range()
(not currently implemented in coord_sf()
, but see PR #2832).
Also, currently (after application of PR #2832), geom_hline()
etc. only work as expected in longlat coordinates. When we transform (either the dataset or via coord_sf()
), the line falls in the wrong location and doesn't follow the non-linear coordinate system. This is different from the behavior of coord_map()
. Ideally, the transform()
function in coord_sf()
would recognize that the data geom_hline()
provides are not transformed and would transform them accordingly.
library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
ggplot(nc) + geom_sf() + geom_hline(yintercept = 35)
ggplot(nc) + geom_sf() + geom_hline(yintercept = 35) + coord_sf(crs = 5070)
ggplot(nc) + geom_sf() + geom_hline(yintercept = 1500000) + coord_sf(crs = 5070)
Created on 2018-08-16 by the reprex package (v0.2.0).
@thomasp85 I think the implementation problem is going to be that by the time the data reaches the transform()
function the crs info has been stripped. Not sure if it is possible to keep it around. It would be really good if it was possible to do so.
library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
nc_layer <- layer_data(ggplot(nc) + geom_sf())
# original data is of class "sf", has crs info
inherits(nc, "sf")
#> [1] TRUE
# layer data generated by ggplot is not of class "sf", doesn't have crs info
inherits(nc_layer, "sf")
#> [1] FALSE
head(nc)
#> Simple feature collection with 6 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74
#> 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091
#> 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487
#> 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188
#> 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508
#> 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421
#> 6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452
#> SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
#> 1 1 10 1364 0 19 MULTIPOLYGON (((-81.47276 3...
#> 2 0 10 542 3 12 MULTIPOLYGON (((-81.23989 3...
#> 3 5 208 3616 6 260 MULTIPOLYGON (((-80.45634 3...
#> 4 1 123 830 2 145 MULTIPOLYGON (((-76.00897 3...
#> 5 9 1066 1606 3 1197 MULTIPOLYGON (((-77.21767 3...
#> 6 7 954 1838 5 1237 MULTIPOLYGON (((-76.74506 3...
head(nc_layer)
#> geometry PANEL group xmin xmax ymin
#> 1 MULTIPOLYGON (((-81.47276 3... 1 -1 -84.32385 -75.45698 33.88199
#> 2 MULTIPOLYGON (((-81.23989 3... 1 -1 -84.32385 -75.45698 33.88199
#> 3 MULTIPOLYGON (((-80.45634 3... 1 -1 -84.32385 -75.45698 33.88199
#> 4 MULTIPOLYGON (((-76.00897 3... 1 -1 -84.32385 -75.45698 33.88199
#> 5 MULTIPOLYGON (((-77.21767 3... 1 -1 -84.32385 -75.45698 33.88199
#> 6 MULTIPOLYGON (((-76.74506 3... 1 -1 -84.32385 -75.45698 33.88199
#> ymax linetype alpha stroke
#> 1 36.58965 1 NA 0.5
#> 2 36.58965 1 NA 0.5
#> 3 36.58965 1 NA 0.5
#> 4 36.58965 1 NA 0.5
#> 5 36.58965 1 NA 0.5
#> 6 36.58965 1 NA 0.5
Created on 2018-08-16 by the reprex package (v0.2.0).
One more post. Actually, the crs info is retained in the geometry
column, so that's not a major hurdle. The bigger question is what to do with regular position columns, such as x
, y
, xmin
, xmax
, etc. Ideally they would also have to be transformed, but only (x, y) pairs can be transformed using a geospatial coordinate system. Maybe the transformation could be limited to data frames that have exactly one x and one y column (coord_map()
seems to do that). That would cover many basic geoms. Also, the coord might need a switch to determine whether or not to transform non-geometry columns, and if so how to interpret the data in those columns. This sounds like a major rework of coord_sf()
.
I think the sf support needs some love anyway🙂. But the current transform already take a stance on what to do with non-sf layers (ignore them) so your last point is not really specific to moving the transform operation.
@thomasp85 I think these issues are sufficiently interwoven that it makes sense to tackle them at the same time, or at least think about them at the same time. Note that the current transform()
doesn't ignore non-sf layers, it treats them identical to sf layers. It assumes everything comes in a common CRS and then scales everything to the previously identified x and y ranges in that CRS:
https://github.com/tidyverse/ggplot2/blob/5f868c598e88c7096896c0c8c37cbd34a3fb8e47/R/sf.R#L300-L315
(The comment in line 307 is misplaced because it essentially applies to the entire function.)
If the geometry data don't arrive in a common CRS, then it becomes a somewhat open question in what CRS the non-geometry data and the panel ranges should arrive. And also, you may have to transform the data twice, once in setup_panel_params()
to identify the ranges and generate graticules and once in transform()
to draw.
Ah - I was referring to the current implementation in master l, which do ignore it. I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs
You may be right hat this should all be tackled at once. I’ll carve some time out next week to get acquainted with your PR
I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs
I think it needs to be configurable, if only for backwards compatibility.
I'm fairly sure that the call to st_transform()
occurs in CoordSf$setup_data()
because it ensures that the scales are trained in the proper coordinate system (I imagine you all know this I just didn't see it mentioned above). It's possible to do this in the Stat
instead, transforming only the extent (I do this with rasters in ggspatial).
I've also wondered why non-spatial layers aren't assumed as WGS84...I implemented a stat_spatial_identity()
that makes this work for point geometries, but obviously won't work for anything else. https://github.com/paleolimbot/ggspatial/blob/master/R/geom-spatial.R#L38
I've also wondered why non-spatial layers aren't assumed as WGS84
I have been thinking about this quite a bit. As far as I can tell, the largest problem is repeated transformations. Ideally, we would hold all coordinates for all layers in WGS84 until draw time. This would mean transforming all spatial layers to WGS84, and then transforming to the desired CRS twice, once to figure out the scale extents and once to actually draw. That seems rather inefficient.
The following is some code I use to train the scales for rasters in ggspatial...basically it creates a grid in the source CRS, projects it, then gets the extent in the destination CRS. If it's decided that the st_transform()
call really belongs in Coord$transform()
, something like this could be added to StatSf
:
proj_grid <- sf::st_make_grid(proj_corners, n = 10, what = "corners")
out_grid <- sf::st_transform(proj_grid, crs = to_crs)
out_bbox <- sf::st_bbox(out_grid)
@thomasp85 is this still a barrier for gganimate?
I have played around with this a bit, and it should be doable to move the transformation step. This step is after scale training, so we can't train scales the usual way. Instead, you can train scales on the extent, as Dewey indicated.
However, training on the extent rather than the transformed data will include extraneous points.
For example, the data below is st_crs(4326)
data projected to st_crs(5070)
.
The points are a grid along the extent in original coordinates.
Under the new transformation rule, the outermost grid points are included in the extent. This lets scales include a rather large part near the bottom where there is no geometry.
I'm not a geospatial expert, so I don't know if this is an acceptable tradeoff.
The current implementation of
CoordSf
has the crs transformation implemented in thesetup_data()
method. This is, in my view the wrong place to have it and a departure from the otherCoord
where coordinate transformations happens at the very end inside theGeom$draw_*()
method usingCoord$transform()
.Rather than just being a little inconsistency, it has implications for
gganimate
and how it might let an animation change crs parameters during an animation.If this is a change you'd accept I'd be happy to make a PR