Closed rsbivand closed 4 years ago
@rsbivand - thank you for your work in this regard. Do you have any advice on how to proceed with those changes? How can I manually check which dataset in spData could be problematic, and then how to decide on the correct +towgs84
values?
I'm thinking that sf and rgdal and any other packages with transform functions may need to include a test on startup to see whether current PROJ diverges from PROJ 4.9.3. So far, CRAN Windows binaries are 4., so the effects are seen only where the packages are installed from source against PROJ 6.. If a package only offers files, like spData, the consequences will only occur once the objects have been read in, so .rda and native GIS format behaviour will probably differ (the .rda freeze the CRS representation, but we don't know how a CRS with a +datum= string will be handled - yet). One way is to mapview::mapview
or tmap in view mode in the two settings and check visually; another to add the equivalent transformation to EPSG:4326 and check for equivalence of a sample point across the two settings. The QGIS approach is to ask the user to choose the transformation path manually, which is not appropriate in workflows and scripts. For geocompr, I guess adding (hidden) code to trap changes is feasible, say on ch 6. Maybe a blog / addendum to ch 6 might be a goal?
Trying to link the concepts ellipsoid/spheroid, datum, grid. As far I as understand it:
+ellps
) defines the global shape of the Earth with the distances from the center to the pole and from the center to the equator. +datum
) is then just a modification (i.e. transformation, rotation and scaling) of this spheroid +towgs84
tag is always needed, since transformations between CRS's always go via wgs84?@edzer teaches using Datums and Map Projections by Iliffe and Lott https://www.whittlespublishing.com/Datums_and_Map_Projections. Maybe we need his course online?
+towgs84
gives 3 or 7 parameters for a parametric shift, datum grids give the shift for every location up to some resolution, but can then be interpolated. Reading the book Roger mentioned will make you realize that datum transformation will always be there, and will never be simple.
I'm begining to suspect that the GDAL3/PROJ6 combination is unkind to the shapefile driver. I'll try later to add a shapefile without a +datum=
but with an equivalent +towgs84=
copied in manually from GDAL 2.4.2 data/datum:shift.csv (446.448 | -125.157 | 542.06 | 0.15 | 0.247 | 0.842 | -20.489 I think). We'll need to check a range of drivers to see which handle input CRS differently too.
I just submitted sf 0.8-0 to CRAN addressing the PROJ 6.2.0 issues seen on debian testing, and those caused by the changed tidyr 1.0-0 API.
Dear @rsbivand, @edzer and the rest of #rspatial developers, I still try to wrap my head around the future changes related to the PROJ library (PROJ6 and after). What is the future of the proj4string definitions? Are there plans to (1) keep it, (2) drop it from #rspatial, (3) add some other way of defining projections (e.g., WKT), or (4) do something different? (https://gdal.org/api/ogr_srs_api.html#_CPPv416OSRExportToProj420OGRSpatialReferenceHPPc).
Good question - what do you suggest?
As of now, rgdal on R-Forge (1.5.1, very unstable, rev 862) when built with GDAL >= 3 and PROJ >= 6 will issue warnings when a DATUM is in the input vector SRS but discarded in the PROJ string. Next step to extend this to raster (see http://fwarmerdam.blogspot.com/2010/03/in-last-few-weeks-i-believe-i-have-made.html).
See thread on gdal-dev.
New list_coordOps()
takes two CRS strings and returns coordinate operation pipelines. The easiest CRS strings are such as "EPSG:4326"
. For sf, users are going to have to insert EPSG codes, but there is somewhere to put them in a "crs"
object.
In sp and dependencies, "CRS"
is S4, so no free slot. I suggest the same kludge as for rings and holes, that is to add a comment()
to "CRS"
objects to contain the CRS string now needed. This needs to update the "projargs"
slot, and needs checking too. Updating it would largely be manual, and users would have to be alerted to the need to do so.
The CRS string could also be added to sf "crs"
objects, checking for NULL
(as with a NULL
comment), to provide somewhere to put WKT, URN, PROJJSON or other string representations. There still isn't clarity about where to put the EPOCH if known.
Feedback on rgdal very welcome. Most ideas should propagate to sf without extra difficulty. Comment now if this matters for you, @mdsumner made useful comments on the gdal-dev thread.
Notes on https://github.com/r-spatial/discuss/issues/28 about packages depending on sf and rgdal for coordinate operations entered last week.
Maybe also extend proj string degradation testing to towgs84
and nadgrids
; datum
was the most obvious problem, but towgs84
is also checked in the current code, and the warning qualified if a towgs84
makes it through OK (helps with -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H pj_transform()
), but won't help with PROJ 6 transformation I think (these need +type=crs
too anyway).
@Nowosad : could you see whether current rgdal tickles any bugs with the spData vector files?
Link to the gdal-dev thread.
R-Forge rgdal rev 864 adds raster read warnings (we'll see if it builds on old PROJ/GDAL). Next checks for discarded PROJ string keys/tags on write for vector and raster.
@rsbivand should I check spData on dev rgdal using PROJ5 or PROJ6?
Offline now: prog >= 6, gdal >= 3. Current vector adds wkt2 comment to sp CRS objects. Next step raster, then reconstruct CRS projargs from wkt2.
Roger Bivand Falsensvei 32 5063 Bergen
lør. 2. nov. 2019, 12.53 skrev Jakub Nowosad notifications@github.com:
@rsbivand https://github.com/rsbivand should I check spData on dev rgdal using PROJ5 or PROJ6?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-spatial/sf/issues/1146?email_source=notifications&email_token=ACNZ3BCBCII7LLXAVPUY23TQRVS4VA5CNFSM4IVRBE42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC42HLQ#issuecomment-549036974, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZ3BHXHIS43VNEEZIFFXDQRVS4VANCNFSM4IVRBE4Q .
This is the WIP-vignette from R-Forge/rgdal 1.5-1 rev 888. Comments welcome! PROJ6_GDAL3.zip
RFC now on http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html, no need to unzip.
I started working on a wkt2
branch, anticipating GDAL3/PROJ6, and assuming @rsbivand 's sp
fork will be merged into master sp
. I assume that GDAL3 implies PROJ6. This branch:
wkt2
to crs
objects, which is only filled if GDAL >= 3.0.0, and is otherwise NA_character_
wkt2
from the dataset read by OGR (dominant use case I assume), or from an EPSG set (as in st_crs(4326)
), or else from the proj4string (as in st_crs("+proj=longlat")
)wkt2
is not NA
, always the wkt2
field to set a coordinate reference system on a dataset e.g. in st_write
or st_transform
and the target CRS in st_transform
comment(x)
for conversion from CRS
(sp) to crs
(sf) in case x
is of class CRS
(sp), if the comment is not NULL
crs
to CRS
, which sets the comment field too.crs
objects with their wkt2
field in case it is not NA
I now see warnings of this kind:
< Warning message:
< In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
< Discarded datum Amersfoort in CRS definition
emitted by (Roger's forked) sp, and am not sure sf
should do that too (and when).
Still todo:
print
methods for sf
and sfc
should have a one-line summary of the crs
that takes from the wkt rather than the proj4.st_transform
path, and the ability to choose a secondary optionThanks, this is real progress! For the todo list, but not to be implemented yet, is grid CDN handling. This widens and deepens the third todo list item. This is as in rgdal::list_coordOps()
and its print method first, I guess, as it is only listing as inside projinfo
that you see the alternatives including better choices not available because of missing grids. However, the grid matter is a bit tricky, because CRAN binary packages have share/proj non-writable I think. There is a trick for extending the search list, but I haven't included that - rgdal, sf and lwgeom would need to coordinate that.
Wrt. the warning messages - I feel we should leave them in place, possibly with an option to mute them, until we are sure that both GDAL3/PROJ6 and GDAL<3/PROJ<=6 work (warning on configuration that GDAL<3/PROJ6 is a bad idea).
Do you know how I can tell R CMD build to use the wkt2
branch on my fork?
I'm looking at removing the requirement for rgdal 1.5-1 in my fork of sp 1.3-3, but haven't had time yet.
To build this branch In your fork directory:
git pull https://github.com/r-spatial/sf.git wkt2
git checkout wkt2
I just did git checkout wkt2
since the repo is upstream of my fork. I'll wait a bit because I'm teaching on Monday too, but if you want to offer a PR for my talk on Tuesday morning, I can install sf_0.8-1
.
They live together:
git checkout master
takes you back to the master branch.
This gives this possibility:
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
pt = st_sfc(st_point(c(0, 60)), crs = 4326)
(old = st_axis_order()) # query default: FALSE means interpret pt as (longitude latitude)
# [1] FALSE
st_transform(pt, 3857)[[1]]
# POINT (0 8399738)
st_axis_order(TRUE) # now interpret pt as (latitude longitude), as EPSG:4326 prescribes
# [1] TRUE
st_transform(pt, 3857)[[1]]
# POINT (6679169 0)
st_axis_order(old)
# [1] FALSE
This gets and sets a global variable in the C++ code.
Improved, now to return the old value, rather than the newly set:
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
pt = st_sfc(st_point(c(0, 60)), crs = 4326)
st_axis_order() # query default: FALSE means interpret pt as (longitude latitude)
# [1] FALSE
st_transform(pt, 3857)[[1]]
# POINT (0 8399738)
(old_value = st_axis_order(TRUE)) # now interpret pt as (latitude longitude), as EPSG:4326 prescribes
# [1] FALSE
st_axis_order() # query current value
# [1] TRUE
st_transform(pt, 3857)[[1]]
# POINT (6679169 0)
st_axis_order(old_value) # set back to old value
# [1] TRUE
Video of talk on this problem on: https://www.youtube.com/playlist?list=PLXUoTpMa_9s10NVk4dBQljNOaOXAOhcE0, third in playlist; presentation at https://rsbivand.github.io/ECS530_h19/ECS530_III.html.
We now have this, which reproduces Roger's example:
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
p = read_sf(system.file("gpkg/b_pump.gpkg", package="sf"))
st_transform(p, 3857)$geom[[1]]
# POINT (-15218.79 6712598)
# on Broadway St.: try mapview::mapview(p)
crs = st_crs(p)
crs$wkt2 = NULL # remove
st_crs(p) = crs
# Warning message:
# st_crs<- : replacing crs does not reproject data; use st_transform for that
st_transform(p, 3857)$geom[[1]]
# POINT (-15040.05 6712507)
Good. I guess we take things in two steps, leaving the CDN and grid deployment for the next step?
Yes. And thanks for the video. I watched it entirely. Fascinating! "Why are they doing this to me before I retire"
"They"'ll think of something juicy for you ... convert everything to s2?
Aha, I pushed commits to my fork of sp to relax the need to have rgdal >= 1.5-1 installed first, but I have a check NOTE that I haven't solved for rgdal < 1.5-1 - for obvious reasons functions introduced in 1.5-1 are not used but:
* checking dependencies in R code ... NOTE
Missing or unexported objects:
‘rgdal::checkCRSArgs_ng’ ‘rgdal::new_proj_and_gdal’
I can't see how to use globalVariables()
or suppressForeignCheck()
in this context.
Earlier the conditions were run together, so an < 1.5-1 instance just failed, so this is much better, and maybe CRAN would accept an sp with a NOTE given the circumstances.
I suggest we try the latter option. It will go away when we update rgdal too, right?
Yes, it does.
Here is the modified printing of headers of sf
and sfc
objects. It
NA
, uses the srs->GetName()
from GDAL, unless this is "unkown"
proj4string
NA
, prints only a single line
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
p = st_read(system.file("gpkg/b_pump.gpkg", package="sf"))
# Reading layer `b_pump' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.6/sf/gpkg/b_pump.gpkg' using driver `GPKG'
# Simple feature collection with 1 feature and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
# projected CRS: Transverse_Mercator
# on Broadway St.: try mapview::mapview(p)
st_transform(p, 3857)
# Simple feature collection with 1 feature and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -15218.79 ymin: 6712598 xmax: -15218.79 ymax: 6712598
# projected CRS: WGS 84 / Pseudo-Mercator
# epsg (SRID): 3857
# cat geom
# 1 1 POINT (-15218.79 6712598)
st_transform(p, 4326)
# Simple feature collection with 1 feature and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -0.1367127 ymin: 51.5133 xmax: -0.1367127 ymax: 51.5133
# geographic CRS: WGS 84
# epsg (SRID): 4326
# cat geom
# 1 1 POINT (-0.1367127 51.5133)
st_transform(p, "+proj=laea")
# Simple feature collection with 1 feature and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -10539.23 ymin: 5518341 xmax: -10539.23 ymax: 5518341
# proj4string: +proj=laea +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
# cat geom
# 1 1 POINT (-10539.23 5518341)
suppressWarnings(st_crs(p) <- NA)
p
# Simple feature collection with 1 feature and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
# CRS: NA
# cat geom
# 1 1 POINT (529393.5 181020.6)
This leaves only one TODO, but I don't think the last TODO is essential before releasing this, it's a new feature really (though a nice one). I also want to allow giving a +pipe ...
transformation, directly to PROJ, such that e.g. one can swap x and y coordinates.
First priority now would be writing an sf vignette and/or r-spatial blog explaining the changes so far, as they come with GDAL3 and PROJ6.
@edzer I am unsure if you seen it, but maybe it could be useful as a source of inspiration/disussion - https://jorisvandenbossche.github.io/blog/2020/02/11/geopandas-pyproj-crs/
Thanks @Nowosad , helpful! I like their output, it is more human-readable than WKT.
I think the formatting is somewhere here: https://github.com/pyproj4/pyproj/blob/master/pyproj/_crs.pyx, or nearby.
Now +towgs84=: https://github.com/r-spatial/sf/issues/1328
If anyone is interested in taking up WKT formatting/parsing as a project, I'd be interested to hear!
This is linked to https://github.com/r-spatial/discuss/issues/28.
This zip contains a simple reprex, the Broad Street pump location in a shapefile used in ASDAR (https://asdar-book.org/), includes a
+datum=
specification. There are also two R scripts each for sf and rgdal, one pair just using+towgs84=
, the other pair using+nadgrids=
. The first pair were run with PROJ 4.9.3/GDAL 2.4.2 and PROJ 6.2.0/GDAL 3.0.1 (output Rout files with*4*
and*6*
), thegrid*.R
files only under PROJ 6.2.0/GDAL 3.0.1 to use the new PROJ search path extension mechanism tested intest.cpp
.reprex.zip
Basic findings: for PROJ 4.9.3/GDAL 2.4.2, the
+datum=
spec gives a reasonable transformation of the input to EPSG:4326. For PROJ 6.2.0/GDAL 3.0.1, the+datum=
spec is dropped on reading in sf and rgdal, and variants of+towgs84=
specs need to be added manually to the CRS to get a reasonable transformation (not off by ~100m).Side finding: writing the object with a
+towgs84=
with theESRI Shapefile
driver in both sf and rgdal drops the+towgs84=
, but preserves it for theGPKG
driver.Only for PROJ 6: using the NTv2 grid in the Europe datumgrid zip archive indicated by
projinfo
gives a good transformation with both sf and rgdal using a CRS modified by adding+nadgrids=
and extending the search path to include the current directory (where the*.gsb
file is located, not included in zipfile, seeprojinfo
output for download link).Conclusion: Moving to PROJ 6 in software is feasible, but typical workflow input may suffer invisible degradation during processing for rgdal and sf workflows and their reverse dependencies. @tim-salabim (re: mapview) @mtennekes (re: tmap) @Nowosad (re. spData).