r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

Tutorial(s) and communicating on using PROJ>=6/GDAL3 in R #43

Open florisvdh opened 3 years ago

florisvdh commented 3 years ago

Opening a new discussion, as I've been trying to write a tutorial on current good practice in specifying a CRS in R (with focus on the commonly used sf, sp and raster). It would be great if some people could have a critical look at this tutorial. Comments / pull requests are most welcome, and that includes language improvement. Getting feedback on it was the primary reason for me to open this issue.

But, feel free to widen up discussion, about communication (needs?) to R users on practical PROJ>=6/GDAL3 aspects they must know about. (And, who knows, I may have missed existing communications so I'd be happy to hear about it!) While currently most developers and more advanced R users will be aware of these changes, and have found where to look on the internet, I think that many R users are still unaware (even though they should be alarmed by the proj4strings warnings). At least that's my experience - after all these are recent changes.

The context of the above linked tutorial is the following. In our organization (Research Institute for Nature and Forest - www.inbo.be/en) we are gradually extending a tutorials website (https://inbo.github.io/tutorials/), mostly oriented towards reproducible science (spatial topics are found by clicking on the 'gis' tag.). Its primary aim is to support and stimulate scientists within the organization, but of course it is available to everyone and open to contributions. And next to the website, some very nice colleagues organize coding clubs in order to accelerate hands-on experience with R. Overall, we have about 150 R users (as derived from our internal useR mailing list), with a large variety in their R experience and in their effective R usage. Not all scientists are enough self-reliant to find out about technical aspects they need to know, hence they desire being supported in some coordinated way. I imagine this is within normal expectations and it is the case elsewhere.

In order to fill that gap we believe organization-level tutorials are of help. In some cases we just make a tutorial to provide a link to another tutorial elsewhere, in order to prevent unnecessary duplication and keep maintenance low.

rsbivand commented 3 years ago

Thanks, useful. I'll be speaking at whyR 2020 on Sunday 27 September at 11.00 - some parts of the talk and the (to be posted soon) code extend the celebRation 2020 talk you reference. I'll post a link once I have one.

I haven't read through your tutorial carefully yet, but a question that is becoming more visible is that EPSG:4326 is by definition Latitude-Longitude, while OGC:CRS84 is by definition Longitude-Latitude. From the GDAL barn-raising, PROJ/GDAL are trying to be authority-compliant, but this isn't resolved yet on our side. In the talk, I pivot to using OGC:CRS84 instead of EPSG:4326; you may think about what works for your target readership, but probably EPSG:4326 is a bad idea for GIS/visualization from the point of view of axis order.

florisvdh commented 3 years ago

That's very interesting to learn about @rsbivand , it was beyond my knowledge (I thought we could always reside to EPSG). I will definitely have look at that - looking forward to your talk.

edzer commented 3 years ago

I agree that writing OGC:CRS84 instead of EPSG:4326 with long-lat data is a good idea, as it makes axis order unambiguous and is authority compliant. I don't agree that "GDAL tries to be authority compliant", but it allows you to be (but by default isn't, I believe). There is still "what most software does" (interpret epsg:4326 as long-lat) versus "what EPSG tells you do to" (interpret epsg:4326 as lat long - which has always been like that). Before assuming that EPSG:4326 is lat long by default in R, we may want to align with QGIS, GRASS, PostGIS and other OSGEO projects.

rsbivand commented 3 years ago

Link to my Sunday whyR talk (11.00 CEST) probably https://youtu.be/mEAyQ8bv1zU. Followed by interesting panel. Presentation files on https://github.com/rsbivand/whyR20_files; output from current rsbivand/sp and rev 1066 of rgdal, sf 0.9-6 CRAN.

florisvdh commented 3 years ago

Thanks Roger for the interesting extra slides (42-73), which you couldn't discuss in the talk, about (among others) proj4string stripping, axis order, WKT representation by GDAL versus PROJ (and, some new rgdal functions). Well explained and I liked reading it!

rhijmans commented 3 years ago

It is helpful, except, I think, for the statement that one should "specify the CRS by using the EPSG code, but do so without using a proj4string".

That is not generally usable advice. While you can use these codes for the few established CRSs, you cannot use it to define an appropriate system for the data you are working with (and there is an infinite number of these). While we can still use PROJ notation, at least with the WGS84 datum, that seems to be the only generally usable approach available; unless you are able to create your own WKT. I am not aware of any effort to write software to support creation of WKT from basic components (like a PROJ string), but that could be very useful. Either way, it would be good if your tutorial explained how to do that, if there is a way.

mdsumner commented 3 years ago

While you can use these codes for the few established CRSs, you cannot use it to define an appropriate system for the data you are working with (and there is an infinite number of these)

indeed, in fact I've been considering writing out a short explanation of centring a local projection on a lon,lat location and using metres to buffer an extent around it. It's handy to show the cosine hack one would use to get an approximately correct aspect ratio with angular coordinates, but a throw way proj-string projection and a metres-either-side range is much simpler. (sometimes you care more about arbitrary centre and extent than you do about the projection family, and in addition you may not care at all about epoch-related inaccuracy). I've been using this as a target spec with the GDAL warper and it's clear it's the best way to generate a local projection for general use.

It could be expanded into a "we use this epoch in such-and-such context", so here's the WKT2 version of a throw away proj string used in the more casual ways ....

edzer commented 3 years ago

I may be wrong but think the issue is the difference between using proj4 strings as coordinate reference systems, or as coordinate transformations. If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want but in general not be a good idea, e.g. if you're working in another datum and only want to project. The pivotal WGS84 is the thing that PROJ > 5 managed to get away from. You can use "proj4" notation to create conversion or transformation pipelines, e.g. to do a projection without a datum transformation; this is what PROJ & GDAL (still) do, internally. Both sp and sf accept proj.4 string-defined CRS's as input, just no longer as CRS representation.

mdsumner commented 3 years ago

Yes, except code is also representation, and round-tripping has no hope to exist. The problems are real ones, and the benefits are clear only in local and primarily terrestrial contexts, exactly the ones where a local authority grid likely already exists and exploring the properties of projections has less value.

The WKT2 rigour in the closed contexts of sf and terra with their built-in GDAL stacks is unproblematic afaic, they just aren't the entire story.

rhijmans commented 3 years ago

If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want

If you set up your own CRS parameters, changes are that this is what you want.

but in general not be a good idea, e.g. if you're working in another datum and only want to project

But what other option would I have? That way I could get at least get a WKT2 for the WGS84 variant; and then I could try to use that as template by replacing WGS84 with the datum of choice. But I have never needed that; for the reasons that Mike alludes to.

If we use them as CRS, they get +datum=WGS84 attached which may happen to be what you want, but in general not be a good idea

To extend this discussion a bit: when using the WGS84 datum, the +proj notation is the "in script" representation that I use and would recommend. This is because the proj strings are self-explanatory, and that makes it easy to interpret them and to spot errors. In contrast, EPSG codes are opaque, and their use leads to inferior code that is harder to read and more likely to contain errors. I admit that I am uncertain about the downside of this approach; about all the things that could go wrong, nasty things like axis reversal --- and the degree to which our software could prevent such accidents from happening.

So is it fair to say that:

1) You can no longer use +proj with datum != WGS84 2) You can use EPSG codes (or similar) in that situation 3) If there is no code for what you want, you can use +proj= , but only with datum=WGS84 (and you can use that to create a WKT2 from which it should be possible to derive a variant with another datum of choice). 4) if you use datum=WGS84, you can use the "+proj=" notation instead of EPSG codes to be more explicit. but be aware of certain risks.... (which are?) 5) It is not foreseen that the "+proj=... datum=WGS84" route will be dropped by the library maintainers (I sure hope this is true)

?

mdsumner commented 3 years ago

That summary accords with my understanding, which has been slow - you can also use the modern equivalent of +sphere which is +R=, with an explicit radius, and that works fine as long as you don't stray too far from the earth's radius (at some threshold an error triggers from PROJ itself). I can't find a direct doc of this rn but it's on the proj site, I think.

I think the risks are at sub-metre level for ensemble datums, if you don't have the time stamp of your coordinate measures and ability to choose the right epoch. Other risks are the standard ones we always had (100s or so metres for failure to specify the actual datum).

You also can't use +init=<auth>:<code> anymore, but that's a PROJ lib version thing, modern equivalent is <auth>:<code>. (Just worth a mention with the above list, I think).

rsbivand commented 3 years ago

Is it possible that an R function might be needed for custom CRS to either just write a WKT2-2019 or PROJJSON string from user Proj4-like input? If PROJ was available, the representation could be validated, and/or proj_create() could be used with a Proj4 string, and/or proj_create_argv() ?

I recall that PROJJSON was suggested as somewhat easier to handle than WKT2, but for user input, Proj4 strings are clearly easiest.

The downside is of course that EPSG coded CRS are much easier to fit into SQL search for transformation pipelines. I've seen 1-2m given as the EPSG:4326 ensemble inaccuracy, with further comments on https://lists.osgeo.org/pipermail/gdal-dev/2021-May/054112.html , https://github.com/rouault/gdal/blob/rfc_81/gdal/doc/source/development/rfc/rfc81_coordinate_epoch.rst . I don't think things have settled at all. If we were already an OSGeo community, maybe we'd be able to put forward points of view, I'm unsure about that, though.

florisvdh commented 3 years ago

Thanks @rhijmans for the feedback! I'd be happy to improve the tutorial where possible. Related to @mdsumner 's remark, its scope is indeed limited to officially registered (say EPSG) CRSs - probably we should state that more clearly. But I feel it is the scope most users are confronted with. So importantly, the tutorial was not intended to explain on defining CRSs from scratch. I think that would best fit in a separate tutorial - cf. post above.

The tutorial's aim is to give generally applicable, basic advice, easy for (beginning) R users (in the light of the recent proj4string obsoletion). In that regard (keeping things simple and robust), I doubt whether it's a good idea to showcase that one can also use +proj to specify CRSs that use the WGS84 datum (datum:EPSG::6326) - the latter is then implicitly assumed by the (current) software implementation. But we could say something about it (see below).

IMO the easiest / failsafe way in specifying an 'official' CRS (for spatial data) is referring the CRS from an official database (like proj.db) instead of manually providing (some of) the parameters.

The WKT2 rigour in the closed contexts of sf and terra with their built-in GDAL stacks is unproblematic afaic, they just aren't the entire story.

Thanks! In order to make the tutorial more complete, it may indeed be a good idea to add a short note on the WGS84 datum case in parentheses or as a footnote (referring to +proj ), or just refer to the PROJ documentation if someone wishes to use projstrings. In that way things don't get too complicated for the beginner, and only full CRS definitions are considered in the tutorial (either defined by the EPSG key or the OGC WKT standard). Note, the EPSG:xxxx format is covered in the tutorial.

In contrast, EPSG codes are opaque, and their use leads to inferior code that is harder to read and more likely to contain errors

Sure one could make mistakes and pass the wrong key. But I think it's a better managed approach - to comply with EPSG CRSs that is. I don't understand 'harder to read' in your explanation.

I am not aware of any effort to write software to support creation of WKT from basic components (like a PROJ string), but that could be very useful. Either way, it would be good if your tutorial explained how to do that, if there is a way.

There is much merit indeed in explaining more on what can be done (and not) with projstrings. It is not an area I'm familiar with though. IMO this deserves its own tutorial (you're also welcome to add it to the inbo/tutorials repo, even though it's linked to my organization, but it would then sit next to the EPSG-CRS tutorial).

--

As a sidenote, and referring to aforementioned WGS84 inaccuracies, IMHO a CRS with a specific WGS84 datum realization (latest realization is G1762: EPSG:1156), such as the EPSG:9057 CRS, seems preferable (when possible) to use over EPSG:4326 (or OGC:CRS84) CRS which uses the WGS84 ensemble. The latter (datum:EPSG::6326) is an ensemble datum ~implicitly refers to the current WGS84 datum realization~, so this datum definition is not constant in time. IIUC this will have an effect regardless of correctly setting the epoch of the coordinates (which I think can be done for each WGS84 datum realization independently).

(PS: I'm not often online these weeks, sorry for the late reply)

rhijmans commented 3 years ago

My main point was that saying that the general principle is to

specify the CRS by using the EPSG code, but do so without using a proj4string.

seems extreme and exaggerated. Instead you could recommend to use EPSG codes when available; and note that you can use PROJ.4 notation as long as you do not stray from WGS84. There is, by the way, nothing "official" about these codes. I am not suggesting to explain how to roll your own WKT2 CRS with another datum (as we do not seem to know how to easily do this!)

I feel it is the scope most users are confronted with

There might be many that have other needs. I have worked a lot with species distribution data. These do not tend to follow national boundaries and so the appropriate CRS is not in the databases. There is a national grid for Belgium, one for the Netherlands and (presumably) one for Luxembourg. What do you use when you map the Benelux? So I think the situation in which you want to define your own parameters is very common. Of course you could pick one that is "close enough" from a list, but is that good advice?

I don't understand 'harder to read' in your explanation.

To me this is hard to read: ESRI:102004; it means nothing to me. Whereas +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 is very clear and expressive --- (it takes some getting used to at first, of course). I do not know what EPSG:6326 is, but I know what +proj=longlat stands for. Thus I think that a script that uses the latter is better, unless the former is truly needed, e.g. to deal with datum realizations (now explain that to your general user ;)

((CRSs create a lot of misunderstanding. Using codes does not educate. It would be have been much better to expand PROJ notation, it seems to me. I do not know the limits of that, but the point is moot, we are going to have to live with it.))

CRS with a specific WGS84 datum realization seems preferable

How would that matter in data analysis (as opposed to the e.g. the perfect drone delivery service, be it of goods or bads); as long as the original datum is known and the data correctly transformed to the same (appropriate) datum?

mdsumner commented 3 years ago

Yes, this is so important. The idea that you must find a code is incredibly toxic, and you can see this idea propagated online in so many communities where experienced users should know better (it's similar to the "find your UTM zone" advice, which is hopelessly naive and off-base).

rsbivand commented 3 years ago

If it the case that the position data are only going to be used by the person choosing the CRS using a custom Proj4 string, without any datum transformation, then by all means use Proj4 strings. If the data are going to be written to file/db, then the CRS representation matters much more, as is the case for datum transformation.

Another question - for vector data, an s2 approach with only GEOGCRS on a sphere is probably easiest - then everybody is in the same situation. But how do raster and s2 work together? This is, I feel, somewhere in the gdistance area, isn't it? s2 to WebMercator isn't accurate, but at scales of several hundred km, does it matter? Potentially yes? If so (data integration from multiple sources), we are still stuck. Admittedly, GRASS locations could be defined somewhat like Proj4, but in that case, one transformed all non-compliant input to the location CRS. Do we still need to do that, or is s2 spherical without worrying about epochs or datum ensembles close enough? I'd be very interested in seeing @paleolimbot 's views. For species distribution data, is a planar representation essential, or is a spherical representation adequate, and side-steps the CRS representation question?

paleolimbot commented 3 years ago

Honoured that anybody is interested in my views!

s2 shines when the data are already provided as longitude/latitude, which is and I imagine will be common for a long time. Correcting for epoch or other CRS issues will probably always be at the beginning or end of a script...s2 will always sit in the middle. The other place where s2 shines, regardless of the initial form of the data, is when you need to ask a global question (e.g., "within 500 km of a coastline but not on land"). Those questions rarely have an accuracy requirement that is impacted by the spherical assumption.

Maybe a good way to summarize my feelings about this are that there are no "innaccurate" reference systems, only inaccurate transformations, inaccurate distances between points, and inaccurate representations of edges (i.e., how you draw a line between two coordinates). s2 can help by minimizing the number of times one has to invoke a transformation and has tools that can make edge representation explicit by adding points (at the expense of all distances in s2 being innaccurate/spherical only).

rsbivand commented 3 years ago

@paleolimbot thanks! My feeling is that maybe we can lessen the need for custom projections by choosing to use spherical representations to a greater extent from now on, rather than having to go planar to handle distances/areas/intersections. However, perhaps species distribution models, if they presume planar, might need re-visiting? Are perhaps also global spherical grid systems, like the now-archived dggridR package https://github.com/r-barnes/dggridR/ or similar? I feel that, unless analysts need high precision, spherical is not worse than planar some distance from the central lon-lat point.

rhijmans commented 3 years ago

I agree that using lon/lat can be very good for some types of data analysis, but that is a different discussion. There are many instances in which someone would reasonably want to define a custom CRS. As-is, in the GDAL/PROJ world we only have the proj4 string for that, even if its usability has degraded. In the tutorial, Floris says that the proj4 notation should not be used, in part because he believes it will likely be removed. That may be true or not, but if we could only use EPSG (etc) codes we would not have competent general purpose spatial data analysis software. We would need additional software that translates a (perhaps expanded) proj4 string (or similar) to wkt2. (and, who knows, it could then be included in the next version of PROJ? )

rsbivand commented 3 years ago

I think all these cases are already covered in both sf and sp workflows. The relevant problem is that GDAL degrades input file CRS when exporting to Proj4, so the internal R-side CRS representation in sp or sf workflows cannot be Proj4 when reading from file or checking the representation for validity against PROJ. PROJ and GDAL both import Proj4 strings, but deprecate +towgs84=, +nadgrids= and except WGS84, NAD83 and NAD27 +datum= (and possibly others, unsure). So for the example above:

> sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96")
Coordinate Reference System:
  User input: +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 
  wkt:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84")
Coordinate Reference System:
  User input: +proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84 
  wkt:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
   "3.10.0dev"        "3.3.0"        "8.0.1"         "true"         "true" 
          PROJ 
       "8.0.1" 

with default insertion of WGS84 as datum in the WKT2-2019 representation if +ellps= is not given. For sp workflows:

> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+datum=WGS84 +units=m +no_defs 
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+ellps=WGS84 +units=m +no_defs 
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on WGS84 ellipsoid in Proj4 definition
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> (o <- sp::CRS("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=WGS84"))
CRS arguments:
 +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
+ellps=WGS84 +units=m +no_defs 
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on WGS84 ellipsoid in Proj4 definition
> cat(sp::wkt(o), "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on WGS84 ellipsoid",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1],
                ID["EPSG",7030]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",33,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> rgdal::rgdal_extSoftVersion()
          GDAL GDAL_with_GEOS           PROJ             sp           EPSG 
       "3.3.0"         "TRUE"        "8.0.1"        "1.4-6"      "v10.018" 

So R-side WKT2-2019 can be created from Proj4 string user input, usually without an authority for the PROJCRS. The WKT2-2019 should also be OK to write to file as part of an object, and should be retrieved as written.

It would be useful to check transformation pipelines between different +ellps= values; as a starter:

> o <-sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=GRS80")
> o0 <-sf::st_crs("+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +ellps=airy")
> sf::sf_proj_pipelines(o0, o)
Candidate coordinate operations found:  1 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
...
Best instantiable operation has only ballpark accuracy 
Description: Inverse of unknown + Ballpark geographic offset from unknown to
             unknown + unknown
Definition:  +proj=pipeline +step +inv +proj=lcc +lat_0=39 +lon_0=-96
             +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 +ellps=airy
             +step +proj=lcc +lat_0=39 +lon_0=-96 +lat_1=33
             +lat_2=45 +x_0=0 +y_0=0 +ellps=GRS80
mdsumner commented 3 years ago

My take, merely for transformations available to general purpose software as Robert mentioned, is to have a basic interface to what PROJ does. @paleolimbot provides this in libproj but has cryptic and painful problems keeping that on CRAN, they have been difficult and unhelpful with failed communications from their side and failed to indicate an impending archive step because of problems in the package. Communications have been ambiguous, and it depends who on the CRAN team is replying. That's typical in my experience, the more in the club you are the better, with no transparency about what they support, help with, etc. They try to be robotic, objective, but they are not.

If you can use PROJ as-is, you get everything you need, input proj4, wkt, whatever format the library provides, and the user/developer can make choices. If we had that shared resource then it'd be easy to write an adjunct to the blog post discussed.

edzer commented 3 years ago

pkg libproj made the proj sources part of the package, which sounds attractive because you loose the need to install proj besides R when doing source installs, but which also puts all the PROJ sources under the CRAN src checking, leading to problems similar to e.g. those we now face with s2 (g++11 issues in the upstream library, us needing to push google to move for our case, everyone waiting).

I don't see why the way libproj interfaced PROJ can't be done without including all of PROJ in src but simply using the way sf/rgdal/terra etc use PROJ. If that is what you need then why don't you do that? I also don't see how complaining about CRAN contributes to the discussion here.

rsbivand commented 3 years ago

This also relates for PROJ to the size of its distributed data; having a projdata package has been suggested, but is version-fragile as Windows and MacOS CRAN binaries may use different versions. This also impacts OSGeo4W, as users may see one PROJ version in R (even if looking at the OSGeo4W-bundled data) and a different version of the data. This matters now because EPSG 10 changed the definitions of the database table and field structures in some places (so R/C code built against later PROJ through PROJ itself bites the dust when looking at an earlier DB structure).

florisvdh commented 3 years ago

Thanks @rhijmans for shedding light on your usecase (or data analysis in general), and for advocating being able to define one's own CRS (I like the Benelux example :slightly_smiling_face:). Currently I'm unsure what's the best advice to easily define a new CRS (see below) and this may need discussion at PROJ directly. It seems it is not currently a purpose of PROJ to make this easy.

Enlisting here some things noted, after reading through pages on https://proj.org (but nothing more):

All in all, it seems that the particular case of creating a new CRS is not specifically envisaged, but I may be wrong. The general idea seems to be that an actual CRSs should be defined in full (WKT or equivalent). So this ultimately seems a debate between full versus partial CRS specifications.

Anyway, I intend to add a bit of nuance to the tutorial, regarding projstrings. Probably it's safer to say 'goodbye PROJ.4' than 'goodbye proj4strings' :wink:.

paleolimbot commented 3 years ago

Lots of good stuff here! All my bits are off topic...I'll respond here but feel free to open threads elsewhere if there's any interest in further discussion.

florisvdh commented 3 years ago

After reading further in PROJ documentation I conclude that some thoughts in my previous post, about PROJ assuming an implicit datum, and hence confusion in the documentation, were incorrect - sorry for that. I'll try to clarify that part here (corresponding parts are crossed out above).

As I understand now, (modern) proj-strings only define the coordinate operation itself, i.e. without defining the input and output CRS (and hence, their datum). So if a datum shift is needed (because input & output CRS have different datum), then the corresponding shift must be defined in the pipeline, but the actual CRS datums don't. Another way to say this would be that the geoid's position relative to the ellipsoidal CS are (generally) neglected in proj-strings. For operations with geographical coordinates only the ellipsoid model and relative positions between ellipsoidal CSs (e.g. by Helmert transformation) are defined by proj-strings: one doesn't need the geoid position (relative to the ellipsoidal CS) to define these coordinate operations. Of course you still do need the datum of your input and output CRS in order to match coordinates with real locations.

If I remember correctly, PROJ.4 did assume a WGS84 datum default and allowed an alternative datum definition (e.g. directly with +datum or through 'early binding' with the WGS84 hub using +towgs84). In my previous post there was an assumption that PROJ still uses (or assumes) an input datum, but that seems wrong.

Meanwhile, I'm preparing some questions, to address at PROJ, about the role of proj-strings in declaring CRSs (including custom ones).

florisvdh commented 3 years ago

Below follow some results from attempts to declare a CRS from a proj-string, with recent PROJ (using projinfo). I hope this sheds further light on the discussion. Main conclusion: it can be done, but it is discouraged and is considered legacy (unsure how long this remains in place). Importantly, support for direct datum definition is generally absent for proj-strings, except indirectly through using +towgs84. From most of PROJ documentation it is also apparent that proj-strings are not used or advertised as a means to declare a CRS, but for coordinate operations instead.

--

While the declaration of a CRS using a proj-string seems not documented on proj.org for the main PROJ programs (projinfo etc.), it can be done but with little support for datum *()**. Taking Roger's first example CRS (while running PROJ 7.2.1):

*()** 'little support for datum': that is, except when you're happy to use a BOUNDCRS, which is generated when adding a +towgs84 key and hence with a datum indirectly implied by an 'abridged transformation' to target CRS WGS84 (EPSG:4326). That corresponds to the 'early binding' method of PROJ.4 IIUC. Documented here in the context of a transformation pipeline. Note however that +towgs84 is generally not returned anymore by projinfo -o proj, so you should use older sources to get at the needed parameters (+towgs84 is sometimes still returned, e.g. with zeros for EPSG:4258). Example with EPSG:31370 (Belge 1972 / Belgian Lambert 72):

IMO the last steps (support for BOUNDCRS with +towgs84) feel contradictory to the deprecation of +towgs84, mentioned in several places in PROJ documentation. It think it's a relevant question to ask where this is heading to in the future.

For the R side I understand that "The relevant problem is that GDAL degrades input file CRS when exporting to Proj4" (Roger's comment) so this would mean that adding +towgs84 in R won't help anyway (cf. the rgdal warnings about discarded elements) - being unsure about this route.

rsbivand commented 3 years ago

Recent sp/rgdal provide get_source_if_boundcrs=TRUE and in rgdal/R/Class-CRSx.R:

    wkt2 <- try(showSRID(uprojargs, format="WKT2", multiline="YES",
        prefer_proj=prefer_proj), silent=TRUE)
    if (!inherits(wkt2, "try-error")) {
      if (get_source_if_boundcrs) {
        if (length(grep("^BOUNDCRS", wkt2)) > 0L) {
          wkt2a <- try(.Call("get_source_crs", wkt2, PACKAGE="rgdal"),
            silent=TRUE)
          if (!inherits(wkt2a, "try-error")) wkt2 <- wkt2a
        }
      }
      if (get_enforce_xy()) {
        wkt2a <- try(.Call("proj_vis_order", wkt2, PACKAGE="rgdal"),
          silent=TRUE)
          if (!inherits(wkt2a, "try-error")) {
            wkt2 <- wkt2a
          }
      }
      res[[3]] <- wkt2
    }
  }

where the C code is:

SEXP get_source_crs(SEXP source) {

    PJ_CONTEXT *ctx = proj_context_create();
    PJ *source_crs, *target_crs;
    SEXP res;

    source_crs = proj_create(ctx, CHAR(STRING_ELT(source, 0)));

    if (source_crs == NULL) {
        proj_context_destroy(ctx);
        error("source crs not created");
    }

    target_crs = proj_get_source_crs(ctx, source_crs);

    if (target_crs == NULL) {
        proj_context_destroy(ctx);
        error("target crs not created");
    }

    PROTECT(res = NEW_CHARACTER(1));
    SET_STRING_ELT(res, 0,
#if PROJ_VERSION_MAJOR == 6  && PROJ_VERSION_MINOR < 3
        COPY_TO_USER_STRING(proj_as_wkt(ctx, target_crs, PJ_WKT2_2018, NULL))
#else
        COPY_TO_USER_STRING(proj_as_wkt(ctx, target_crs, PJ_WKT2_2019, NULL))
#endif
    );
    UNPROTECT(1);
    proj_destroy(target_crs);
    proj_destroy(source_crs);
    proj_context_destroy(ctx);

    return(res);

}

BOUNDCRS are found most often when a file contains a +towgs84= tag on reading, so unless the argument is set FALSE, the remnants of this tag will be removed. As you say, it is no longer the role of the CRS definition to provide a specific pipeline to GEOGCRS "WGS84".

florisvdh commented 3 years ago

Thanks for pointing at the argument. Since for below example the argument does not have an effect, it seems as if the +towgs84 element was already discarded at an earlier stage (by GDAL itself?). Correct?

Reprex ``` r library(sp) rgdal::rgdal_extSoftVersion() #> GDAL GDAL_with_GEOS PROJ sp #> "3.2.1" "TRUE" "7.2.1" "1.4-5" # old proj4string (with towgs84) for EPSG:31370: ################################################## crs1 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = TRUE) #> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj #> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford) #> ellipsoid in Proj4 definition # same, but get_source_if_boundcrs = FALSE ################################################## crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE) #> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj #> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford) #> ellipsoid in Proj4 definition # current proj-string (without towgs84) for EPSG:31370: ################################################## crs3 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs", get_source_if_boundcrs = TRUE) #> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj #> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford) #> ellipsoid in Proj4 definition # same, but get_source_if_boundcrs = FALSE ################################################## crs4 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE) #> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj #> = prefer_proj): Discarded datum Unknown based on International 1909 (Hayford) #> ellipsoid in Proj4 definition all.equal(crs1, crs2) #> [1] TRUE all.equal(crs1, crs3) #> [1] TRUE all.equal(crs1, crs4) #> [1] TRUE cat(wkt(crs1), sep = "\n") #> PROJCRS["unknown", #> BASEGEOGCRS["unknown", #> DATUM["Unknown based on International 1909 (Hayford) ellipsoid", #> ELLIPSOID["International 1909 (Hayford)",6378388,297, #> LENGTHUNIT["metre",1, #> ID["EPSG",9001]]]], #> PRIMEM["Greenwich",0, #> ANGLEUNIT["degree",0.0174532925199433], #> ID["EPSG",8901]]], #> CONVERSION["unknown", #> METHOD["Lambert Conic Conformal (2SP)", #> ID["EPSG",9802]], #> PARAMETER["Latitude of false origin",90, #> ANGLEUNIT["degree",0.0174532925199433], #> ID["EPSG",8821]], #> PARAMETER["Longitude of false origin",4.36748666666667, #> ANGLEUNIT["degree",0.0174532925199433], #> ID["EPSG",8822]], #> PARAMETER["Latitude of 1st standard parallel",51.1666672333333, #> ANGLEUNIT["degree",0.0174532925199433], #> ID["EPSG",8823]], #> PARAMETER["Latitude of 2nd standard parallel",49.8333339, #> ANGLEUNIT["degree",0.0174532925199433], #> ID["EPSG",8824]], #> PARAMETER["Easting at false origin",150000.013, #> LENGTHUNIT["metre",1], #> ID["EPSG",8826]], #> PARAMETER["Northing at false origin",5400088.438, #> LENGTHUNIT["metre",1], #> ID["EPSG",8827]]], #> CS[Cartesian,2], #> AXIS["(E)",east, #> ORDER[1], #> LENGTHUNIT["metre",1, #> ID["EPSG",9001]]], #> AXIS["(N)",north, #> ORDER[2], #> LENGTHUNIT["metre",1, #> ID["EPSG",9001]]]] ``` Created on 2021-06-18 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)
Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os Linux Mint 20 #> system x86_64, linux-gnu #> ui X11 #> language nl_BE:nl #> collate nl_BE.UTF-8 #> ctype nl_BE.UTF-8 #> tz Europe/Brussels #> date 2021-06-18 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.1.0) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0) #> lattice 0.20-44 2021-05-02 [4] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0) #> rgdal 1.5-23 2021-02-03 [1] CRAN (R 4.1.0) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0) #> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> sp * 1.4-5 2021-01-10 [1] CRAN (R 4.1.0) #> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) #> xfun 0.23 2021-05-15 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /home/floris/lib/R/library #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```
rsbivand commented 3 years ago

Interesting. I cannot check without re-installing sp and rgdal from CRAN, as the development versions (''sp my github fork, rgdal** R-forge) do different things, including caching CRS objects (some packages make repeated calls to CRS() with the same input). When nothing else is running, I'll try. rgdal::compare_CRS() gives more output than all.equal().

rsbivand commented 3 years ago

I don't see any difference after re-installing sp and rgdal from CRAN. The +towgs84= tag is removed by default in rgdal::checkCRSArgs_ng() irrespective of get_source_if_boundcrs= when rgdal::get_prefer_proj() - the default. This uses PROJ to instantiate the object internally.

> crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition
> crs2
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
> rgdal::set_prefer_proj(FALSE)
> crs2_OSR <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition,
 but +towgs84= values preserved
> crs2_OSR
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
+towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747
+units=m +no_defs 
> cat(wkt(crs2), sep = "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
> cat(wkt(crs2_OSR), sep = "\n")
BOUNDCRS[
    SOURCECRS[
        PROJCRS["unknown",
            BASEGEOGCRS["unknown",
                DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
                    ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                        LENGTHUNIT["metre",1,
                            ID["EPSG",9001]]]],
                PRIMEM["Greenwich",0,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8901]]],
            CONVERSION["unknown",
                METHOD["Lambert Conic Conformal (2SP)",
                    ID["EPSG",9802]],
                PARAMETER["Latitude of false origin",90,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8821]],
                PARAMETER["Longitude of false origin",4.36748666666667,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8822]],
                PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8823]],
                PARAMETER["Latitude of 2nd standard parallel",49.8333339,
                    ANGLEUNIT["degree",0.0174532925199433],
                    ID["EPSG",8824]],
                PARAMETER["Easting at false origin",150000.013,
                    LENGTHUNIT["metre",1],
                    ID["EPSG",8826]],
                PARAMETER["Northing at false origin",5400088.438,
                    LENGTHUNIT["metre",1],
                    ID["EPSG",8827]]],
            CS[Cartesian,2],
                AXIS["(E)",east,
                    ORDER[1],
                    LENGTHUNIT["metre",1,
                        ID["EPSG",9001]]],
                AXIS["(N)",north,
                    ORDER[2],
                    LENGTHUNIT["metre",1,
                        ID["EPSG",9001]]]]],
    TARGETCRS[
        GEOGCRS["WGS 84",
            DATUM["World Geodetic System 1984",
                ELLIPSOID["WGS 84",6378137,298.257223563,
                    LENGTHUNIT["metre",1]]],
            PRIMEM["Greenwich",0,
                ANGLEUNIT["degree",0.0174532925199433]],
            CS[ellipsoidal,2],
                AXIS["geodetic latitude (Lat)",north,
                    ORDER[1],
                    ANGLEUNIT["degree",0.0174532925199433]],
                AXIS["geodetic longitude (Lon)",east,
                    ORDER[2],
                    ANGLEUNIT["degree",0.0174532925199433]],
            ID["EPSG",4326]]],
    ABRIDGEDTRANSFORMATION["Transformation from unknown to WGS84",
        METHOD["Position Vector transformation (geog2D domain)",
            ID["EPSG",9606]],
        PARAMETER["X-axis translation",-106.869,
            ID["EPSG",8605]],
        PARAMETER["Y-axis translation",52.2978,
            ID["EPSG",8606]],
        PARAMETER["Z-axis translation",-103.724,
            ID["EPSG",8607]],
        PARAMETER["X-axis rotation",0.3366,
            ID["EPSG",8608]],
        PARAMETER["Y-axis rotation",-0.457,
            ID["EPSG",8609]],
        PARAMETER["Z-axis rotation",1.8422,
            ID["EPSG",8610]],
        PARAMETER["Scale difference",0.9999987253,
            ID["EPSG",8611]]]]
> crs2_OSR_not_bound <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs")
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition,
 but +towgs84= values preserved
> crs2_OSR_not_bound
CRS arguments:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
+towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747
+units=m +no_defs 
> cat(wkt(crs2_OSR_not_bound), sep = "\n")
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

The chronology by reconstruction was possibly that I switched all instantiation via rgdal::SRS_string() from OSR to PROJ (get/set_prefer_proj()) after adding get_source_if_boundcrs=, but feeling that PROJ is "closer to the metal" than OSR in GDAL. It was obvious that PROJ was more predictable - if something like +towgs84= is "gone", it is "gone". Similarly, the PROJ transformation pipelines seem closer to the metal than GDAL, so rgdal uses them, not GDALs. It looks also as though the development versions of sp (my fork) and rgdal remove the BOUNDCRS anyway, which could be a logic branching error in the get_source_if_boundcrs = FALSE and prefer_proj FALSE branch:

> library(sp)
> crs2 <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
Warning message:
In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) :
  Discarded datum Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid in Proj4 definition
> crs2
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> rgdal::set_prefer_proj(FALSE)
> crs2_OSR <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = FALSE)
> crs2_OSR
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
> crs2_OSR_no_bnds <- CRS("+proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.869,52.2978,-103.724,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs +type=crs", get_source_if_boundcrs = TRUE)
> crs2_OSR_no_bnds
Coordinate Reference System:
Deprecated Proj.4 representation:
 +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
+lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
+no_defs 
WKT2 2019 representation:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown based on International 1924 (Hayford 1909, 1910) ellipsoid",
            ELLIPSOID["International 1924 (Hayford 1909, 1910)",6378388,297,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",4.36748666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",51.1666672333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",49.8333339,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",150000.013,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",5400088.438,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]] 
rsbivand commented 3 years ago

In development sp, identical results are caused not by a logic error, but by (new) CRS look-up caching. Repeated look-ups retrieve output CRS from a local cache if the same input is received.

florisvdh commented 3 years ago

Thanks a lot Roger for demonstrating the effect of rgdal::set_prefer_proj(), and explaining on how rgdal itself decides on removing +towgs84 + changes in dev versions - learning a lot today :slightly_smiling_face:!!! This indeed explains why first there was no effect of get_source_if_boundcrs (default being rgdal::set_prefer_proj(TRUE)).

In a context of changing use of proj-strings, IMHO I also think it is natural to consider PROJ as reference. Currently it still supports +towgs84 (BOUNDCRS) but at the same time +towgs84 is considered deprecated (I intend to ask the PROJ developers (shortly) where that is going to). So, dropping +towgs84 for rgdal::set_prefer_proj(TRUE) could feel counterintuitive at the moment, since it's still "not gone". But understood.

Some things to note:

florisvdh commented 3 years ago

Sharing here some outcomes from the PROJ mailing list, earlier this week, on the subject of using PROJ strings to specify a CRS (answers from Even Rouault and others). PROJ strings for CRSs will get continued (though reduced) support, for backward compatibility reasons.

After clarifying some aspects, following advice is considered appropriate (see also https://lists.osgeo.org/pipermail/proj/2021-June/010301.html):

  1. Using a PROJ string to specify a CRS is discouraged and currently not much documented by PROJ (PROJ strings are used for coordinate operations instead). At least for registered CRSs, always use AUTHORITY:CODE or WKT2, not PROJ strings.
  2. Also custom CRSs are ideally specified as WKT2, or more conveniently as (currently not yet official) PROJJSON, which follows the WKT2 structure.
  3. When specifying a custom CRS with a PROJ string, the WGS84 ensemble datum (datum:EPSG::6326) will automatically be adopted if no +ellps is specified (and with +ellps, the datum is considered as 'unknown'). If these conditions are fine for your usecase, then there's no problem.
  4. When specifying a custom CRS that doesn't use the WGS84, NAD27 or NAD83 datum (which can be specified with +datum), it is better to use WKT2 or PROJJSON and properly refer the geodetic datum. A typically less accurate alternative is defining as a BOUNDCRS by specifying +towgs84 transformation parameters.

I intend to use that as basis for a few updates in our tutorial on CRSs in R.

Further notes:

florisvdh commented 2 years ago

Just updated the discussed CRS tutorial (link). I essentially added a bit more info on current state of PROJ string support for CRSs (partly in footnotes), and referred to PROJJSON, meanwhile trying to stay in line with PROJ documentation and trying not to complicate things too much. Basic advice for specifying EPSG CRSs of course remained the same.

rsbivand commented 2 years ago

Maybe replace proj4string(cities2) <- crs_wgs84 by slot(cities2, "proj4string") <- crs_wgs84.

Also consider introducing something like:

EPSG_db <- rgdal::make_EPSG()
EPSG_db[grep("Belg", EPSG_db$note), 1:2]

to help in finding EPSG codes. Another optional extra is:

data(GridsDatums, package="rgdal")
GridsDatums[grep("Belg", GridsDatums$country),]

The latter needs updating to link directly to https://www.asprs.org/wp-content/uploads/2018/03/10-16-GD-Belgium.pdf for the 2016 link from https://www.asprs.org/asprs-publications/grids-and-datums. These short articles give useful context on how countries have adapted and modernised their standards.

florisvdh commented 2 years ago

Maybe replace proj4string(cities2) <- crs_wgs84 by slot(cities2, "proj4string") <- crs_wgs84.

OK, thank you Roger.

Also consider introducing something like:

EPSG_db <- rgdal::make_EPSG()
EPSG_db[grep("Belg", EPSG_db$note), 1:2]

to help in finding EPSG codes.

Yes, that's a good fit indeed!

The ASPRS Grids and Datums articles are an interesting source, thanks for reminding. IIRC for Belgium it focuses less (resp. not) on the two currently used CRSs: 1) "Belge 1972 / Belgian Lambert 72" (EPSG:31370, based on the "Reseau National Belge 1972" datum) - still used most often - and 2) "ETRS89 / Belgian Lambert 2008" (EPSG:3812, based on the ETRS89 datum) which has accuracy advantages in using GPS data and for conversion to other ETRS89-based CRSs within Europe (EPSG or custom CRSs). In time, it can be expected that the second will have more priority; it is also required by the EU INSPIRE directive. (Comments by the Flemish Geographic Information Agency are at https://www.geopunt.be/voor-experts/lambert-2008 - only available in Dutch).

I hope to delve deeper into this later on (consulting official Belgian sources, including related topics such as the Flemish Positioning Service FLEPOS and the unfortunate ESRI:102499 mismatch), and perhaps start a separate page about it, for Belgian R users and other interested people :wink:.