r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.33k stars 293 forks source link

PROJ 6 drops +datum= specification; transform degraded in workflows #1146

Closed rsbivand closed 4 years ago

rsbivand commented 5 years ago

This is linked to https://github.com/r-spatial/discuss/issues/28.

This zip contains a simple reprex, the Broad Street pump location in a shapefile used in ASDAR (https://asdar-book.org/), includes a +datum= specification. There are also two R scripts each for sf and rgdal, one pair just using +towgs84=, the other pair using +nadgrids=. The first pair were run with PROJ 4.9.3/GDAL 2.4.2 and PROJ 6.2.0/GDAL 3.0.1 (output Rout files with *4* and *6*), the grid*.R files only under PROJ 6.2.0/GDAL 3.0.1 to use the new PROJ search path extension mechanism tested in test.cpp.

reprex.zip

Basic findings: for PROJ 4.9.3/GDAL 2.4.2, the +datum= spec gives a reasonable transformation of the input to EPSG:4326. For PROJ 6.2.0/GDAL 3.0.1, the +datum= spec is dropped on reading in sf and rgdal, and variants of +towgs84= specs need to be added manually to the CRS to get a reasonable transformation (not off by ~100m).

Side finding: writing the object with a +towgs84= with the ESRI Shapefile driver in both sf and rgdal drops the +towgs84=, but preserves it for the GPKG driver.

Only for PROJ 6: using the NTv2 grid in the Europe datumgrid zip archive indicated by projinfo gives a good transformation with both sf and rgdal using a CRS modified by adding +nadgrids= and extending the search path to include the current directory (where the *.gsb file is located, not included in zipfile, see projinfo output for download link).

Conclusion: Moving to PROJ 6 in software is feasible, but typical workflow input may suffer invisible degradation during processing for rgdal and sf workflows and their reverse dependencies. @tim-salabim (re: mapview) @mtennekes (re: tmap) @Nowosad (re. spData).

Nowosad commented 5 years ago

@rsbivand - thank you for your work in this regard. Do you have any advice on how to proceed with those changes? How can I manually check which dataset in spData could be problematic, and then how to decide on the correct +towgs84 values?

rsbivand commented 5 years ago

I'm thinking that sf and rgdal and any other packages with transform functions may need to include a test on startup to see whether current PROJ diverges from PROJ 4.9.3. So far, CRAN Windows binaries are 4., so the effects are seen only where the packages are installed from source against PROJ 6.. If a package only offers files, like spData, the consequences will only occur once the objects have been read in, so .rda and native GIS format behaviour will probably differ (the .rda freeze the CRS representation, but we don't know how a CRS with a +datum= string will be handled - yet). One way is to mapview::mapview or tmap in view mode in the two settings and check visually; another to add the equivalent transformation to EPSG:4326 and check for equivalence of a sample point across the two settings. The QGIS approach is to ask the user to choose the transformation path manually, which is not appropriate in workflows and scripts. For geocompr, I guess adding (hidden) code to trap changes is feasible, say on ch 6. Maybe a blog / addendum to ch 6 might be a goal?

mtennekes commented 5 years ago

Trying to link the concepts ellipsoid/spheroid, datum, grid. As far I as understand it:

rsbivand commented 5 years ago

@edzer teaches using Datums and Map Projections by Iliffe and Lott https://www.whittlespublishing.com/Datums_and_Map_Projections. Maybe we need his course online?

edzer commented 5 years ago

+towgs84 gives 3 or 7 parameters for a parametric shift, datum grids give the shift for every location up to some resolution, but can then be interpolated. Reading the book Roger mentioned will make you realize that datum transformation will always be there, and will never be simple.

rsbivand commented 5 years ago

I'm begining to suspect that the GDAL3/PROJ6 combination is unkind to the shapefile driver. I'll try later to add a shapefile without a +datum= but with an equivalent +towgs84= copied in manually from GDAL 2.4.2 data/datum:shift.csv (446.448 | -125.157 | 542.06 | 0.15 | 0.247 | 0.842 | -20.489 I think). We'll need to check a range of drivers to see which handle input CRS differently too.

edzer commented 5 years ago

I just submitted sf 0.8-0 to CRAN addressing the PROJ 6.2.0 issues seen on debian testing, and those caused by the changed tidyr 1.0-0 API.

Nowosad commented 4 years ago

Dear @rsbivand, @edzer and the rest of #rspatial developers, I still try to wrap my head around the future changes related to the PROJ library (PROJ6 and after). What is the future of the proj4string definitions? Are there plans to (1) keep it, (2) drop it from #rspatial, (3) add some other way of defining projections (e.g., WKT), or (4) do something different? (https://gdal.org/api/ogr_srs_api.html#_CPPv416OSRExportToProj420OGRSpatialReferenceHPPc).

edzer commented 4 years ago

Good question - what do you suggest?

rsbivand commented 4 years ago

As of now, rgdal on R-Forge (1.5.1, very unstable, rev 862) when built with GDAL >= 3 and PROJ >= 6 will issue warnings when a DATUM is in the input vector SRS but discarded in the PROJ string. Next step to extend this to raster (see http://fwarmerdam.blogspot.com/2010/03/in-last-few-weeks-i-believe-i-have-made.html).

See thread on gdal-dev.

New list_coordOps() takes two CRS strings and returns coordinate operation pipelines. The easiest CRS strings are such as "EPSG:4326". For sf, users are going to have to insert EPSG codes, but there is somewhere to put them in a "crs" object.

In sp and dependencies, "CRS" is S4, so no free slot. I suggest the same kludge as for rings and holes, that is to add a comment() to "CRS" objects to contain the CRS string now needed. This needs to update the "projargs" slot, and needs checking too. Updating it would largely be manual, and users would have to be alerted to the need to do so.

The CRS string could also be added to sf "crs" objects, checking for NULL (as with a NULL comment), to provide somewhere to put WKT, URN, PROJJSON or other string representations. There still isn't clarity about where to put the EPOCH if known.

Feedback on rgdal very welcome. Most ideas should propagate to sf without extra difficulty. Comment now if this matters for you, @mdsumner made useful comments on the gdal-dev thread.

Notes on https://github.com/r-spatial/discuss/issues/28 about packages depending on sf and rgdal for coordinate operations entered last week.

rsbivand commented 4 years ago

Maybe also extend proj string degradation testing to towgs84 and nadgrids; datum was the most obvious problem, but towgs84 is also checked in the current code, and the warning qualified if a towgs84 makes it through OK (helps with -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H pj_transform()), but won't help with PROJ 6 transformation I think (these need +type=crs too anyway).

rsbivand commented 4 years ago

@Nowosad : could you see whether current rgdal tickles any bugs with the spData vector files?

edzer commented 4 years ago

Link to the gdal-dev thread.

rsbivand commented 4 years ago

R-Forge rgdal rev 864 adds raster read warnings (we'll see if it builds on old PROJ/GDAL). Next checks for discarded PROJ string keys/tags on write for vector and raster.

Nowosad commented 4 years ago

@rsbivand should I check spData on dev rgdal using PROJ5 or PROJ6?

rsbivand commented 4 years ago

Offline now: prog >= 6, gdal >= 3. Current vector adds wkt2 comment to sp CRS objects. Next step raster, then reconstruct CRS projargs from wkt2.

Roger Bivand Falsensvei 32 5063 Bergen

lør. 2. nov. 2019, 12.53 skrev Jakub Nowosad notifications@github.com:

@rsbivand https://github.com/rsbivand should I check spData on dev rgdal using PROJ5 or PROJ6?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r-spatial/sf/issues/1146?email_source=notifications&email_token=ACNZ3BCBCII7LLXAVPUY23TQRVS4VA5CNFSM4IVRBE42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC42HLQ#issuecomment-549036974, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZ3BHXHIS43VNEEZIFFXDQRVS4VANCNFSM4IVRBE4Q .

rsbivand commented 4 years ago

This is the WIP-vignette from R-Forge/rgdal 1.5-1 rev 888. Comments welcome! PROJ6_GDAL3.zip

RFC now on http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html, no need to unzip.

edzer commented 4 years ago

I started working on a wkt2 branch, anticipating GDAL3/PROJ6, and assuming @rsbivand 's sp fork will be merged into master sp. I assume that GDAL3 implies PROJ6. This branch:

  1. adds a field wkt2 to crs objects, which is only filled if GDAL >= 3.0.0, and is otherwise NA_character_
  2. derives wkt2 from the dataset read by OGR (dominant use case I assume), or from an EPSG set (as in st_crs(4326)), or else from the proj4string (as in st_crs("+proj=longlat"))
  3. uses, if wkt2 is not NA, always the wkt2 field to set a coordinate reference system on a dataset e.g. in st_write or st_transform and the target CRS in st_transform
  4. uses comment(x) for conversion from CRS (sp) to crs (sf) in case x is of class CRS (sp), if the comment is not NULL
  5. adds a proper S4 conversion from crs to CRS, which sets the comment field too.
  6. prints crs objects with their wkt2 field in case it is not NA

I now see warnings of this kind:

< Warning message:
< In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
<   Discarded datum Amersfoort in CRS definition

emitted by (Roger's forked) sp, and am not sure sf should do that too (and when).

Still todo:

rsbivand commented 4 years ago

Thanks, this is real progress! For the todo list, but not to be implemented yet, is grid CDN handling. This widens and deepens the third todo list item. This is as in rgdal::list_coordOps() and its print method first, I guess, as it is only listing as inside projinfo that you see the alternatives including better choices not available because of missing grids. However, the grid matter is a bit tricky, because CRAN binary packages have share/proj non-writable I think. There is a trick for extending the search list, but I haven't included that - rgdal, sf and lwgeom would need to coordinate that.

Wrt. the warning messages - I feel we should leave them in place, possibly with an option to mute them, until we are sure that both GDAL3/PROJ6 and GDAL<3/PROJ<=6 work (warning on configuration that GDAL<3/PROJ6 is a bad idea).

Do you know how I can tell R CMD build to use the wkt2 branch on my fork?

I'm looking at removing the requirement for rgdal 1.5-1 in my fork of sp 1.3-3, but haven't had time yet.

edzer commented 4 years ago

To build this branch In your fork directory:

git pull https://github.com/r-spatial/sf.git wkt2
git checkout wkt2
rsbivand commented 4 years ago

I just did git checkout wkt2 since the repo is upstream of my fork. I'll wait a bit because I'm teaching on Monday too, but if you want to offer a PR for my talk on Tuesday morning, I can install sf_0.8-1.

edzer commented 4 years ago

They live together:

git checkout master

takes you back to the master branch.

edzer commented 4 years ago

This gives this possibility:

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
pt = st_sfc(st_point(c(0, 60)), crs = 4326)
(old = st_axis_order()) # query default: FALSE means interpret pt as (longitude latitude)
# [1] FALSE
st_transform(pt, 3857)[[1]]
# POINT (0 8399738)
st_axis_order(TRUE) # now interpret pt as (latitude longitude), as EPSG:4326 prescribes
# [1] TRUE
st_transform(pt, 3857)[[1]]
# POINT (6679169 0)
st_axis_order(old)
# [1] FALSE

This gets and sets a global variable in the C++ code.

edzer commented 4 years ago

Improved, now to return the old value, rather than the newly set:

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
pt = st_sfc(st_point(c(0, 60)), crs = 4326)
st_axis_order() # query default: FALSE means interpret pt as (longitude latitude)
# [1] FALSE
st_transform(pt, 3857)[[1]]
# POINT (0 8399738)
(old_value = st_axis_order(TRUE)) # now interpret pt as (latitude longitude), as EPSG:4326 prescribes
# [1] FALSE
st_axis_order() # query current value
# [1] TRUE
st_transform(pt, 3857)[[1]]
# POINT (6679169 0)
st_axis_order(old_value) # set back to old value
# [1] TRUE
rsbivand commented 4 years ago

Video of talk on this problem on: https://www.youtube.com/playlist?list=PLXUoTpMa_9s10NVk4dBQljNOaOXAOhcE0, third in playlist; presentation at https://rsbivand.github.io/ECS530_h19/ECS530_III.html.

edzer commented 4 years ago

We now have this, which reproduces Roger's example:

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
p = read_sf(system.file("gpkg/b_pump.gpkg", package="sf"))
st_transform(p, 3857)$geom[[1]]
# POINT (-15218.79 6712598)
# on Broadway St.: try mapview::mapview(p)
crs = st_crs(p)
crs$wkt2 = NULL # remove
st_crs(p) = crs
# Warning message:
# st_crs<- : replacing crs does not reproject data; use st_transform for that 
st_transform(p, 3857)$geom[[1]]
# POINT (-15040.05 6712507)
rsbivand commented 4 years ago

Good. I guess we take things in two steps, leaving the CDN and grid deployment for the next step?

edzer commented 4 years ago

Yes. And thanks for the video. I watched it entirely. Fascinating! "Why are they doing this to me before I retire"

rsbivand commented 4 years ago

"They"'ll think of something juicy for you ... convert everything to s2?

rsbivand commented 4 years ago

Aha, I pushed commits to my fork of sp to relax the need to have rgdal >= 1.5-1 installed first, but I have a check NOTE that I haven't solved for rgdal < 1.5-1 - for obvious reasons functions introduced in 1.5-1 are not used but:

* checking dependencies in R code ... NOTE
Missing or unexported objects:
  ‘rgdal::checkCRSArgs_ng’ ‘rgdal::new_proj_and_gdal’

I can't see how to use globalVariables() or suppressForeignCheck() in this context.

Earlier the conditions were run together, so an < 1.5-1 instance just failed, so this is much better, and maybe CRAN would accept an sp with a NOTE given the circumstances.

edzer commented 4 years ago

I suggest we try the latter option. It will go away when we update rgdal too, right?

rsbivand commented 4 years ago

Yes, it does.

edzer commented 4 years ago

Here is the modified printing of headers of sf and sfc objects. It

edzer commented 4 years ago

This leaves only one TODO, but I don't think the last TODO is essential before releasing this, it's a new feature really (though a nice one). I also want to allow giving a +pipe ... transformation, directly to PROJ, such that e.g. one can swap x and y coordinates.

First priority now would be writing an sf vignette and/or r-spatial blog explaining the changes so far, as they come with GDAL3 and PROJ6.

Nowosad commented 4 years ago

@edzer I am unsure if you seen it, but maybe it could be useful as a source of inspiration/disussion - https://jorisvandenbossche.github.io/blog/2020/02/11/geopandas-pyproj-crs/

edzer commented 4 years ago

Thanks @Nowosad , helpful! I like their output, it is more human-readable than WKT.

rsbivand commented 4 years ago

I think the formatting is somewhere here: https://github.com/pyproj4/pyproj/blob/master/pyproj/_crs.pyx, or nearby.

rsbivand commented 4 years ago

Now +towgs84=: https://github.com/r-spatial/sf/issues/1328

edzer commented 4 years ago

If anyone is interested in taking up WKT formatting/parsing as a project, I'd be interested to hear!