tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.54k stars 2.03k forks source link

Move call to st_transform to transform #2731

Open thomasp85 opened 6 years ago

thomasp85 commented 6 years ago

The current implementation of CoordSf has the crs transformation implemented in the setup_data() method. This is, in my view the wrong place to have it and a departure from the other Coord where coordinate transformations happens at the very end inside the Geom$draw_*() method using Coord$transform().

Rather than just being a little inconsistency, it has implications for gganimate and how it might let an animation change crs parameters during an animation.

If this is a change you'd accept I'd be happy to make a PR

hadley commented 6 years ago

Sure

thomasp85 commented 6 years ago

@edzer before I begin poking at this: Was there technical reason for putting the transform this early or did it just seem right?

edzer commented 6 years ago

@thomasp85 I was not involved in where this happens, and know too little about ggplot2 internals to say something intelligible about it.

clauswilke commented 6 years ago

This may have implications for setup_panel_params() and range() (not currently implemented in coord_sf(), but see PR #2832).

Also, currently (after application of PR #2832), geom_hline() etc. only work as expected in longlat coordinates. When we transform (either the dataset or via coord_sf()), the line falls in the wrong location and doesn't follow the non-linear coordinate system. This is different from the behavior of coord_map(). Ideally, the transform() function in coord_sf() would recognize that the data geom_hline() provides are not transformed and would transform them accordingly.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

ggplot(nc) + geom_sf() + geom_hline(yintercept = 35)

ggplot(nc) + geom_sf() + geom_hline(yintercept = 35) + coord_sf(crs = 5070)

ggplot(nc) + geom_sf() + geom_hline(yintercept = 1500000) + coord_sf(crs = 5070)

Created on 2018-08-16 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

@thomasp85 I think the implementation problem is going to be that by the time the data reaches the transform() function the crs info has been stripped. Not sure if it is possible to keep it around. It would be really good if it was possible to do so.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
nc_layer <- layer_data(ggplot(nc) + geom_sf())

# original data is of class "sf", has crs info
inherits(nc, "sf")
#> [1] TRUE
# layer data generated by ggplot is not of class "sf", doesn't have crs info
inherits(nc_layer, "sf")
#> [1] FALSE

head(nc)
#> Simple feature collection with 6 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#>    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
#> 1 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
#> 2 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
#> 3 0.143     1.630  1828    1828       Surry 37171  37171       86  3188
#> 4 0.070     2.968  1831    1831   Currituck 37053  37053       27   508
#> 5 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
#> 6 0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
#>   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
#> 1     1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
#> 2     0      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
#> 3     5     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
#> 4     1     123   830     2     145 MULTIPOLYGON (((-76.00897 3...
#> 5     9    1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
#> 6     7     954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...

head(nc_layer)
#>                         geometry PANEL group      xmin      xmax     ymin
#> 1 MULTIPOLYGON (((-81.47276 3...     1    -1 -84.32385 -75.45698 33.88199
#> 2 MULTIPOLYGON (((-81.23989 3...     1    -1 -84.32385 -75.45698 33.88199
#> 3 MULTIPOLYGON (((-80.45634 3...     1    -1 -84.32385 -75.45698 33.88199
#> 4 MULTIPOLYGON (((-76.00897 3...     1    -1 -84.32385 -75.45698 33.88199
#> 5 MULTIPOLYGON (((-77.21767 3...     1    -1 -84.32385 -75.45698 33.88199
#> 6 MULTIPOLYGON (((-76.74506 3...     1    -1 -84.32385 -75.45698 33.88199
#>       ymax linetype alpha stroke
#> 1 36.58965        1    NA    0.5
#> 2 36.58965        1    NA    0.5
#> 3 36.58965        1    NA    0.5
#> 4 36.58965        1    NA    0.5
#> 5 36.58965        1    NA    0.5
#> 6 36.58965        1    NA    0.5

Created on 2018-08-16 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

One more post. Actually, the crs info is retained in the geometry column, so that's not a major hurdle. The bigger question is what to do with regular position columns, such as x, y, xmin, xmax, etc. Ideally they would also have to be transformed, but only (x, y) pairs can be transformed using a geospatial coordinate system. Maybe the transformation could be limited to data frames that have exactly one x and one y column (coord_map() seems to do that). That would cover many basic geoms. Also, the coord might need a switch to determine whether or not to transform non-geometry columns, and if so how to interpret the data in those columns. This sounds like a major rework of coord_sf().

thomasp85 commented 6 years ago

I think the sf support needs some love anyway🙂. But the current transform already take a stance on what to do with non-sf layers (ignore them) so your last point is not really specific to moving the transform operation.

clauswilke commented 6 years ago

@thomasp85 I think these issues are sufficiently interwoven that it makes sense to tackle them at the same time, or at least think about them at the same time. Note that the current transform() doesn't ignore non-sf layers, it treats them identical to sf layers. It assumes everything comes in a common CRS and then scales everything to the previously identified x and y ranges in that CRS: https://github.com/tidyverse/ggplot2/blob/5f868c598e88c7096896c0c8c37cbd34a3fb8e47/R/sf.R#L300-L315 (The comment in line 307 is misplaced because it essentially applies to the entire function.)

If the geometry data don't arrive in a common CRS, then it becomes a somewhat open question in what CRS the non-geometry data and the panel ranges should arrive. And also, you may have to transform the data twice, once in setup_panel_params() to identify the ranges and generate graticules and once in transform() to draw.

thomasp85 commented 6 years ago

Ah - I was referring to the current implementation in master l, which do ignore it. I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs

You may be right hat this should all be tackled at once. I’ll carve some time out next week to get acquainted with your PR

clauswilke commented 6 years ago

I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs

I think it needs to be configurable, if only for backwards compatibility.

paleolimbot commented 5 years ago

I'm fairly sure that the call to st_transform() occurs in CoordSf$setup_data() because it ensures that the scales are trained in the proper coordinate system (I imagine you all know this I just didn't see it mentioned above). It's possible to do this in the Stat instead, transforming only the extent (I do this with rasters in ggspatial).

I've also wondered why non-spatial layers aren't assumed as WGS84...I implemented a stat_spatial_identity() that makes this work for point geometries, but obviously won't work for anything else. https://github.com/paleolimbot/ggspatial/blob/master/R/geom-spatial.R#L38

clauswilke commented 5 years ago

I've also wondered why non-spatial layers aren't assumed as WGS84

I have been thinking about this quite a bit. As far as I can tell, the largest problem is repeated transformations. Ideally, we would hold all coordinates for all layers in WGS84 until draw time. This would mean transforming all spatial layers to WGS84, and then transforming to the desired CRS twice, once to figure out the scale extents and once to actually draw. That seems rather inefficient.

paleolimbot commented 5 years ago

The following is some code I use to train the scales for rasters in ggspatial...basically it creates a grid in the source CRS, projects it, then gets the extent in the destination CRS. If it's decided that the st_transform() call really belongs in Coord$transform(), something like this could be added to StatSf:

proj_grid <- sf::st_make_grid(proj_corners, n = 10, what = "corners")
out_grid <- sf::st_transform(proj_grid, crs = to_crs)
out_bbox <- sf::st_bbox(out_grid)
teunbrand commented 1 year ago

@thomasp85 is this still a barrier for gganimate?

teunbrand commented 4 days ago

I have played around with this a bit, and it should be doable to move the transformation step. This step is after scale training, so we can't train scales the usual way. Instead, you can train scales on the extent, as Dewey indicated.

However, training on the extent rather than the transformed data will include extraneous points. For example, the data below is st_crs(4326) data projected to st_crs(5070). The points are a grid along the extent in original coordinates.

Under the new transformation rule, the outermost grid points are included in the extent. This lets scales include a rather large part near the bottom where there is no geometry.

Image

I'm not a geospatial expert, so I don't know if this is an acceptable tradeoff.