inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
90 stars 21 forks source link

Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools** #178

Closed rsbivand closed 1 year ago

rsbivand commented 1 year ago

This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html). Since raster 3.6.3, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.

finnlindgren commented 1 year ago

maptools was only used by inactive code and has now been removed. I've also removed what looked like the last bit of remaining direct rgdal use, a showSRID call, and a call to rgdal::CRSargs in PROJ4 fallback code. I recently added support for sf and terra inputs; some of the internal code still relies on sp object storage, but most of the the surrounding code such as transformation code converts to sf or raw values and uses sf methods for the transformations. The recent sp update lead to messages about rgdal use that took me a while to track down to the internal logic of spTransform as we already weren't using rgdal directly. I found the "evolution" parts of the code but couldn't figure out how to work with it; thanks for the blog link.

The remaining rgdal use is all about setting the warning levels about the PROJ4/6 transition; I assume it should be ok to drop PROJ4 backwards compatibility now, which will then allow removal of a lot of code as well as allow removal of the rgdal dependency. Other remaining uses of e.g. sp and raster are help functions for plotting etc; Those will need to remain in the package as long as feasible to retain backwards compatibility as much as possible with old analysis code etc. (but I'll add some warning system).

Most of the sf/sp related code will be transferred to a separate fmesher package (https://github.com/inlabru-org/fmesher/, kept in sync with inlabru until it's ready for release) that will take over the role that part of the INLA code, to make maintenance easier. The code in INLA needs to be updated to use the new functions; I hope to be able to go through it in the coming few months to switch to the new code in inlabru and fmesher to avoid having do maintain two separate code bases, but to keep at least some backwards compatibility (the wrapper methods fm_crs and fm_transform in inlabru, later in fmesher, handle both sf and sp inputs, so hopefully the transition should be seamless).

rsbivand commented 1 year ago

Thanks! Yes, I think that the PROJ.4 fallback may be dropped now - revdep checks should show any remaining cases. I still find old "Raster*" objects stored as *.rda or *.rds files in some packages, even after trying hard to replace all the "Spatial*" ones during late 2020. Please let me know if you would like me to check a draft package release - I keep the retiring packages in a separate library, so can check with them absent.

finnlindgren commented 1 year ago

The remaining rgeos and rgdal uses in INLA have now been removed, so that will be sorted with the next INLA build (next week). There is a major bug in sf::st_as_sfc.SpatialPolygons in sf CRAN version 1.0-9 that breaks multi-polygon object conversion (it treats polygons as holes, so the "make valid" code removed all but the first polygon in a collection of triangles. I was working on a bug report for this this week when I noticed the problem has already been fixed in the sf github version 1.0-10, so hopefully that goes onto CRAN very soon; the calls to rgeos that needed removing are replaced by conversion to sf and then calls to sf::st_union and related functions, so we're relying on the sp-to-sf conversion to be rock solid.

I also believe that sp::is.na.CRS is broken; it tries to access a comment() on the proj4string slot, but from what I understand, the comment is associated with the entire CRS object itself, and not with a proj4string slot. This bug only affects CRS objects with valid WKT information that doesn't have (or it couldn't figure out) a corresponding projargs, and the CRS objects do not have a proj4string slot [Correction; it affects all CRS with NA in projargs; I just ran into the problem for the special CRS objects where we expected is.na() to be FALSE despite that, like geocentric CRSes with units other than metres]:

> is.na(sp::CRS("+proj=geocent +units=km"))
proj_as_proj_string: GeodeticCRS::exportToPROJString() only supports metre unit
Error in slot(x, "proj4string") : 
  no slot of name "proj4string" for this object of class "CRS"
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  export to PROJ failed: Unknown error (code 4096)

> str(sp::CRS("+proj=geocent +units=m"))
Formal class 'CRS' [package "sp"] with 1 slot
  ..@ projargs: chr "+proj=geocent +datum=WGS84 +units=m +no_defs"
  ..$ comment: chr "GEODCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__

> str(sp::CRS("+proj=geocent +units=km"))
proj_as_proj_string: GeodeticCRS::exportToPROJString() only supports metre unit
Formal class 'CRS' [package "sp"] with 1 slot
  ..@ projargs: chr NA
  ..$ comment: chr "GEODCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  export to PROJ failed: Unknown error (code 4096)

> str(comment(sp::CRS("+proj=geocent +units=km")))
proj_as_proj_string: GeodeticCRS::exportToPROJString() only supports metre unit
 chr "GEODCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  export to PROJ failed: Unknown error (code 4096)

If my understanding is correct, I can make a pull request for a correction to sp for that.

rsbivand commented 1 year ago

Please update sp from github, and see whether get_evolution_status() is exported. If it is, try set_evolution_status(2L) , which will be default from June 2023. The warning is coming from rgdal. These set the evolution status in loaded sp while the initial approach was that the status must be set before sp was loaded.

finnlindgren commented 1 year ago

That gets rid of the warnings, and highlights the is.na.CRS bug more clearly:

> sp::set_evolution_status(2L)
[1] 2
inlabru:devel> is.na(sp::CRS("+proj=geocent +units=km"))
Error in slot(x, "proj4string") : 
  no slot of name "proj4string" for this object of class "CRS"

> sp:::is.na.CRS
function (x) 
{
    is.na(x@projargs) && is.null(comment(slot(x, "proj4string")))
}

# What I believe is the correct approach:
> x<-sp::CRS("+proj=geocent +units=km")
> is.na(x@projargs) && is.null(comment(x))
[1] FALSE
rsbivand commented 1 year ago

Yes, could you provide a simple PR, please?

finnlindgren commented 1 year ago

With sp 1.6-0 installed and rgdal not installed, with the default evolution status 0L (and also with 1L) things are failing in subtle ways; I traced one issue (which may be the only/main issue) to sp::CRS generating an NA CRS when given only an SRS_string input that in proj4 would have been "+proj=longlat +R=1 +no_defs". With evolution status 2L it generates the desired CRS object. The inlabru::fm_CRS method handles this by going via sf, just as sp does with evolution status 2L, so it's really only the old methods still in use in INLA that notice the problem. Since sp::set_evolution_status(2L) can be run by the users when they run into this issue, I'm leaning towards just updating the INLA methods to use the inlabru methods ; this has been the plan anyway. Users that can't/won't upgrade their INLA installations will need to use sp::set_evolution_status(2L) manually until that becomes the new sp default.

The main issue at the moment is that inlabru calls some of the affected INLA methods, so tests for inlabru fail under evolution status 0L when removing the rgdal dependency, so I'm not sure how to handle this in the next CRAN submission of inlabru. I could perhaps add a check in inlabru .onLoad for the sp version and evolution status, and give a warning if it's a combination known to cause issues? Forcing the status to 2L feels wrong. But the check itself needs to be forward compatible with later sp version as well. @rsbivand , any suggestions?

rsbivand commented 1 year ago

Thanks! sp has always, since rgdal CRS support reached CRAN, checked for rgdal and used it for CRS checking if present. I'll see whether anything got changed, but without rgdal, sp should simply use the provided PROJ4 string, IIRC, in status 0L and 1L.

Is it possible that the sp CRS cache (use_cache=TRUE by default) is causing problems? It was added as I saw some packages using CRS() repeatedly with the same input argument, sometimes thousands of times, and time taken in PROJ code to convert to WKT2 was noticable.

finnlindgren commented 1 year ago

The issue the inla methods are running into is when specifying only a SRS_string as input, with contents that wouldn’t necessarily be translatable into a proj4 string (or rather, that sp/rgdal in the past wouldn’t necessarily know how to convert to proj4, like some combinations involving units=km).

rsbivand commented 1 year ago

Please could you provide a reprex?

rsbivand commented 1 year ago

With your PR https://github.com/edzer/sp/pull/128 applied locally:

> (o <- .libPaths())
[1] "/home/rsb/lib/r_libs"                         
[2] "/home/rsb/lib/r_libs_retiring"                
[3] "/home/rsb/topics/R/R422-share/lib64/R/library"
> .libPaths("")
> .libPaths()
[1] "/home/rsb/topics/R/R422-share/lib64/R/library"
> .libPaths(o[1])
> .libPaths()
[1] "/home/rsb/lib/r_libs"                         
[2] "/home/rsb/topics/R/R422-share/lib64/R/library"
> sp::set_evolution_status(0L)
[1] 0
> sp::CRS("+proj=longlat +R=1 +no_defs")
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
> sp::set_evolution_status(1L)
[1] 1
> sp::CRS("+proj=longlat +R=1 +no_defs", use_cache=FALSE)
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
> sp::set_evolution_status(2L)
[1] 2
> sp::CRS("+proj=longlat +R=1 +no_defs", use_cache=FALSE)
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
WKT2 2019 representation:
GEOGCRS["unknown",
    DATUM["unknown",
        ELLIPSOID["unknown",1,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Reference meridian",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]] 

so without a reprex it is hard to tell. With current inlabru/devel:

> sp::set_evolution_status(0L)
[1] 0
> inlabru::fm_CRS("+proj=longlat +R=1 +no_defs")
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
WKT2 2019 representation:
GEOGCRS["unknown",
    DATUM["unknown",
        ELLIPSOID["unknown",1,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Reference meridian",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]] 
Warning message:
In fm_CRS.default("+proj=longlat +R=1 +no_defs") :
  'fm_CRS' should be given a SRS_string for PROJ6 or a known keyword for a
  predefined string given in projargs. Using 'fm_crs()' workaround.
> sp::set_evolution_status(1L)
[1] 1
> inlabru::fm_CRS("+proj=longlat +R=1 +no_defs")
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
WKT2 2019 representation:
GEOGCRS["unknown",
    DATUM["unknown",
        ELLIPSOID["unknown",1,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Reference meridian",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]] 
Warning message:
In fm_CRS.default("+proj=longlat +R=1 +no_defs") :
  'fm_CRS' should be given a SRS_string for PROJ6 or a known keyword for a
  predefined string given in projargs. Using 'fm_crs()' workaround.
> sp::set_evolution_status(2L)
[1] 2
> inlabru::fm_CRS("+proj=longlat +R=1 +no_defs")
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
WKT2 2019 representation:
GEOGCRS["unknown",
    DATUM["unknown",
        ELLIPSOID["unknown",1,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Reference meridian",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]] 
Warning message:
In fm_CRS.default("+proj=longlat +R=1 +no_defs") :
  'fm_CRS' should be given a SRS_string for PROJ6 or a known keyword for a
  predefined string given in projargs. Using 'fm_crs()' workaround.
> sp::set_evolution_status(2L)
[1] 2
> inlabru::fm_crs("+proj=longlat +R=1 +no_defs")
Coordinate Reference System:
  User input: +proj=longlat +R=1 +no_defs 
  wkt:
GEOGCRS["unknown",
    DATUM["unknown",
        ELLIPSOID["unknown",1,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Reference meridian",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
finnlindgren commented 1 year ago

I wasn't entirely clear, sorry. reprex below. It's not technically a bug per se, as the sp::CRS documentation does say that

SRS_string | default NULL, only used when rgdal is built with PROJ >= 6 and GDAL >= 3; a valid WKT string or SRS 
                       definition such as "EPSG:4326" or "ESRI:102761"

which implies it's ignored when rgdal isn't installed, thus breaking code that we wrote to work with WKT definitions as the prime information carrier instead of PROJ4, back when the PROJ4/6 transition took place. But as it does use it when evolution status is 2L, it's not clear to me to handle the transition, except by enforcing evolution status 2L when rgdal isn't installed.

library(sp)

set_evolution_status(0L)
#> [1] 0
c1_CRS <- CRS("+proj=longlat +R=1 +no_defs")
c1_crs <- sf::st_crs(c1_CRS)
c1_crs$wkt
#> [1] "GEOGCRS[\"unknown\",\n    DATUM[\"unknown\",\n        ELLIPSOID[\"unknown\",1,0,\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]],\n    PRIMEM[\"Reference meridian\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433,\n            ID[\"EPSG\",9122]]],\n    CS[ellipsoidal,2],\n        AXIS[\"longitude\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]],\n        AXIS[\"latitude\",north,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]]]"
c2_CRS <- CRS(SRS_string = c1_crs$wkt)
# NA since SRS_string only used when rgdal available :
c2_CRS
#> Coordinate Reference System:
#> Deprecated Proj.4 representation: NA

set_evolution_status(1L)
#> [1] 1
c1_CRS <- CRS("+proj=longlat +R=1 +no_defs")
c1_crs <- sf::st_crs(c1_CRS)
c1_crs$wkt
#> [1] "GEOGCRS[\"unknown\",\n    DATUM[\"unknown\",\n        ELLIPSOID[\"unknown\",1,0,\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]],\n    PRIMEM[\"Reference meridian\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433,\n            ID[\"EPSG\",9122]]],\n    CS[ellipsoidal,2],\n        AXIS[\"longitude\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]],\n        AXIS[\"latitude\",north,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]]]"
c2_CRS <- CRS(SRS_string = c1_crs$wkt)
# NA since SRS_string only used when rgdal available:
c2_CRS
#> Coordinate Reference System:
#> Deprecated Proj.4 representation: NA

set_evolution_status(2L)
#> [1] 2
c1_CRS <- CRS("+proj=longlat +R=1 +no_defs")
c1_crs <- sf::st_crs(c1_CRS)
c1_crs$wkt
#> [1] "GEOGCRS[\"unknown\",\n    DATUM[\"unknown\",\n        ELLIPSOID[\"unknown\",1,0,\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]],\n    PRIMEM[\"Reference meridian\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433,\n            ID[\"EPSG\",9122]]],\n    CS[ellipsoidal,2],\n        AXIS[\"longitude\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]],\n        AXIS[\"latitude\",north,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]]]"
c2_CRS <- CRS(SRS_string = c1_crs$wkt)
# Other method used, both wkt and projargs (derived from the wkt) set correctly:
c2_CRS
#> Coordinate Reference System:
#> Deprecated Proj.4 representation: +proj=longlat +R=1 +no_defs 
#> WKT2 2019 representation:
#> GEOGCRS["unknown",
#>     DATUM["unknown",
#>         ELLIPSOID["unknown",1,0,
#>             LENGTHUNIT["metre",1,
#>                 ID["EPSG",9001]]]],
#>     PRIMEM["Reference meridian",0,
#>         ANGLEUNIT["degree",0.0174532925199433,
#>             ID["EPSG",9122]]],
#>     CS[ellipsoidal,2],
#>         AXIS["longitude",east,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]],
#>         AXIS["latitude",north,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]]]
finnlindgren commented 1 year ago

Looking more closely, I now realise that the WKT information isn't set in the initial CRS constructions at all when given projargs inputs, even when evolution status is 2L for these tests, which is another issue.

library(sp)

set_evolution_status(0L)
#> [1] 0
c1_CRS <- CRS("+proj=longlat")
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat"
set_evolution_status(1L)
#> [1] 1
c1_CRS <- CRS("+proj=longlat")
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat"
set_evolution_status(2L)
#> [1] 2
c1_CRS <- CRS("+proj=longlat")
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat"
finnlindgren commented 1 year ago

I see that the inlabru warning

Warning message:
In fm_CRS.default("+proj=longlat +R=1 +no_defs") :
  'fm_CRS' should be given a SRS_string for PROJ6 or a known keyword for a
  predefined string given in projargs. Using 'fm_crs()' workaround.

may be a bit overzealous now; it seems other packages are now more relaxed about accepting proj4 strings as inputs, and there's no hindrance for us doing that as well, as long as we can rely of sf::st_crs() to handle it properly.

rsbivand commented 1 year ago

You cannot rely on any proj4 string in general. You would have first to be sure that it does not contain and +datum= other than WGS84, NAD83 or NAD27 (they are special-cased), no +towgs84= at all (they now imply a conversion to the WGS84 ensemble), no +nadgrids= either. So you'd have to filter any such string. The other packages will find this bites them when the integrate data sources not yet available or emerging now (UAV, lidar, etc).

> .libPaths()
[1] "/home/rsb/lib/r_libs"                         
[2] "/home/rsb/topics/R/R422-share/lib64/R/library"
> set_evolution_status(2L)
[1] 2
> c1_CRS <- CRS("+proj=longlat +no_defs")
Error in st_crs.character(projargs) : invalid crs: +proj=longlat +no_defs
In addition: Warning message:
In CPL_crs_from_input(x) :
  GDAL Error 1: PROJ: proj_create: Error 1026 (Missing argument): pj_init_ctx: Must specify ellipsoid or sphere
> c1_CRS <- sf::st_crs("+proj=longlat +no_defs")
Error in st_crs.character("+proj=longlat +no_defs") : 
  invalid crs: +proj=longlat +no_defs
In addition: Warning message:
In CPL_crs_from_input(x) :
  GDAL Error 1: PROJ: proj_create: Error 1026 (Missing argument): pj_init_ctx: Must specify ellipsoid or sphere

There are no projection or transformation pipelines for any of these, and getting adequate pipelines is what the changes over the last 6 years have been about (in addition to adding epochs for plate tectonics).

I'm unsure what you need. Do you want sp::CRS() without rgdal and in 0 and 1 to take a given SRS_string= (not just proj4, also WKT2) and insert it into the object being created? That might be possible. We still need to be sure that the CRS cache is under control.

finnlindgren commented 1 year ago

It's precisely to avoid relying on proj4 inputs that we've been using the SRS_string argument instead, and we explicitly do not care about whether the proj4 string is populated by something readable or not. So under evolution status 0 and 1, with no rgdal installed, everything breaks down as it ignores the valid wkt specification input.

I can tweak the internal inlabru logic a bit to workaround the issue, with an approach that's similar to evolution status 2L, by going via sf to get consistent proj4 and wkt information even when constructing CRS objects. Emulation of that:

library(sp)
library(inlabru)

set_evolution_status(0L)
#> [1] 0
c1_CRS <- fm_CRS(fm_crs(CRS("+proj=longlat")))
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
#>   ..$ comment: chr "GEOGCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__
set_evolution_status(1L)
#> [1] 1
c1_CRS <- fm_CRS(fm_crs(CRS("+proj=longlat")))
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
#>   ..$ comment: chr "GEOGCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__
set_evolution_status(2L)
#> [1] 2
c1_CRS <- fm_CRS(fm_crs(CRS("+proj=longlat")))
str(c1_CRS)
#> Formal class 'CRS' [package "sp"] with 1 slot
#>   ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
#>   ..$ comment: chr "GEOGCRS[\"unknown\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.25722"| __truncated__

One of the core issues we have to deal with in inlabru/INLA are geocentric coordinate system, as well as (for numerical stability reasons) scaled versions of the globe in such systems, in particular the radius 1 sphere. The inlabru::fm_transform has its own logic for dealing with the conversions, as our use case isn't something that the GIS community seems to understand exists; for practical coding reasons we have to treat the globe just as a special case of 3D geometry (but with special logic for spherical triangles), whereas most/all of the assumptions in sf are that everyone always should work in flat coordinate systems (except for the unusable s2/no s2 flag that we as package authors have no control over, and that makes all the "magic" s2 usage for longlat coordinates a hindrance that I actively need to avoid accidentally triggering.

As for sp, my concern is to avoid breaking all our user's exiting code that relies on CRS objects having WKT information (recall that we we strongly pushed to avoid PROJ4 and rely on WKT instead, which we did, under the assumption that this wouldn't disappear later on). If an appropriate solution to that is to give an error (or perhaps just a warning) when the sp evolution status is less than 2L and rgdal isn't installed, I can do that, if I can rely on the get_evolution_status() function to remain in the future. That would save me hours of explaining the issue to users...

finnlindgren commented 1 year ago

To clarify: the CRS constructions with proj4 inputs in the test cases above are mostly there to help generate the wkt info that we are having problems with. I don't want to use proj4 inputs, I want to use wkt inputs.

rsbivand commented 1 year ago

I think I understand. Is the user setting one where sp could discover that a specific mandate is being given to trust a given WKT2 when it cannot be verified? That is, could an option (or environment variable) be set on load by say inlabru that sp could access?

And it's not so much GIS uses as geodetics uses, so a long way from global analyses.

finnlindgren commented 1 year ago

I'm not sure if that's worth spending time on for sp; I was under the impression that in the not too distant future it would default to evolution status 2L, and that the problems above would mostly be a transitional problem?

Are "non-standard" WKT2 crs information likely to be an issue also when working with sf? If so, that's a bigger issue. So far things have seemed ok at least; I've had to expand the inlabru specialised wkt-parser code a bit (this will be in fmesher later, but should probably be a separate package if there was a more general need for it).

rsbivand commented 1 year ago

In terms of time-line, you are probably correct. The next sp release is very likely to be the June release switching default status from 0L to 2L, @edzer, do you agree? Consequently a patch release is very unlikely, so adding code just to cover the 0L and 1L cases where a WTK2 string should be accepted uncondiitionally when rgdal is absent doesn't seem sensible. Does 2L work satisfactorily already? If it could be improved, that would be a more rational use of time.

edzer commented 1 year ago

Are "non-standard" WKT2 crs information likely to be an issue also when working with sf?

What do you mean by that? WKT-2 is a standard, the upstream software (PROJ, GDAL) used by sf is supposed to implement it.

The next sp release is very likely to be the June release switching default status from 0L to 2L, @edzer, do you agree?

Hopefully, yes.

finnlindgren commented 1 year ago

“Non-standard” only as in “manually or programmatically constructed as a valid CRS specification that doesn’t correspond to the usual predefined CRS values”. Specifically, with units not in metres, and/or different ellipsoid radius. In the past, these tended to cause issues with code that expected the wit to be convertible to proj4, but the converter failed. This seems to be less of an issue with the sf CRS interface, but the transition from sp and backwards compatibility support is the issue here.

rsbivand commented 1 year ago

Like the missing (NA) construct that we have for writing vector objects in sf, but that construct is in the upstream unit tests.

edzer commented 1 year ago

except for the unusable s2/no s2 flag that we as package authors have no control over, and that makes all the "magic" s2 usage for longlat coordinates a hindrance that I actively need to avoid accidentally triggering

I don't see why a package cannot use sf_use_s2(), or which "magic" is hindering you. I'm also not sure whether your package does any geometrical operations on a plane or sphere at all, but in case you would want to intersect a geometry that crosses the antemeridian or covers one of the poles, I'd guess that s2 will be of more use that GEOS (=no s2). Proper buffers on the sphere are expected to come soon. s2geometry btw also represents all points internally as 3D vectors of unit length.

finnlindgren commented 1 year ago

To do that we would need to capture the current s2 usage state and reset it after with on.exit or similar, so we don't break the user's own code, in case they rely on it being activated or not activated. So yes, we could do this, but at a major maintainability cost for our code. One of the issues is that I simply can't tell whether the operations we do will be affected by s2 or not unless I check what the CRS is (i.e. longlat or not) and s2 activation status, thus making it easier to miss cases where such a check would be needed, as I'm also not the only ne contributing code relating to geometry, and earth-surface centric code is just a special case of more general geometry that we handle. I believe I've avoided any major issues so far, but the unpredictability makes me feel uneasy about relying on it.

And yes, we do do geometric operations, and can encounter use cases where long-lat need to be treated as raw coordinates, as well as use cases where spherical geometry is the relevant geometry.

But this wasn't really the topic here, which is about CRS specification&storage. The sf storage works much better than current sp, which is good, but we do still need to deal with sp code for backwards compatibility of a large set of legacy code.

finnlindgren commented 1 year ago

@rsbivand , I'm trying out the following sp status checking and loading code, for detecting and warning about evo status < 2 when rgdal isn't installed. The idea is to call this at some key places in the inlabru code (and likely INLA code).

# Return TRUE if no sp issue detected, and `requireNamespace("sp")`
# Return FALSE if a potential issue is detected, and give a warning if
# `silent` is `FALSE`
# Give an error if sp not available
check_sp_status <- function(silent = FALSE) {
  sp_version <- tryCatch(packageVersion("sp"),
                            error = function(e) NA_character_)
  if (is.na(sp_version)) {
    if (!silent) {
      stop("No 'sp' version detected.")
    }
    return(FALSE)
  }
  if (sp_version >= "1.6-0") {
    stopifnot(requireNamespace("sp"))
    # Default to 2L to allow future sp to stop supporting
    # get_evolution_status; assume everything is fine if it fails.
    evolution_status <- tryCatch(sp::get_evolution_status(),
                                 error = function(e) 2L)
    rgdal_version <- tryCatch(packageVersion("rgdal"),
                              error = function(e) NA_character_)
    if ((evolution_status < 2L) && is.na(rgdal_version)) {
      if (!silent) {
        warning(
          paste0(
            "'sp' version >= 1.6-0 detected, rgdal isn't installed, and evolution status is < 2L.\n",
            "This may cause issues with some CRS handling code. To avoid this, use 'sp::set_evolution_status(2L)'"
          )
        )
      }
      return(FALSE)
    }
  }
  return(TRUE)
}
rsbivand commented 1 year ago

This looks as though it covers the range of alternatives. Could this be in sp @edzer? Might it also check for sf availability?

finnlindgren commented 1 year ago

I made a slight inconsistency; it was meant to load sp if available, regardless of version. Part of the function wouldn’t make sense to be in sp itself I think. But if you think parts of it would make sense in sp or sf, that would be fine!

finnlindgren commented 1 year ago

Aha, now I see what you meant. It’s mostly the requireNamespace(“sp”) that wouldn’t make sense inside the sp package itself…

finnlindgren commented 1 year ago

I forgot about the automated inlabru examples checking, where the evolution status won't have been set bu the user. For the testthat tests I set it in a setup function that's called at the top of each test file. For the man/*.Rd examples, I'd need to add a bru_safe_sp() call to each example that involves sp in the roxygen documentation; we already have a special function bru_safe_inla() to handle INLA presence/absence as well as multithreading settings that we use in the examples, so it's doable, albeit tedious.

Since sp is currently in Depends, the examples that need it don't check for it. Otherwise I could have located the places that need it with a basic project-wide search.

Setting the status in inlabru::.onLoad() would technically work but would make debugging things a pain, and it would also be messing the user's own environment, which I don't want to do if it can be avoided.

rsbivand commented 1 year ago

Running CMD check in a scenario using sp evolution status 2 (substitute use of rgdal with sf for projection/transformation/CRS) and absence of retiring packages from the library path passes as in this log: 00check.log

finnlindgren commented 1 year ago

I've done an experimental commit now that removes the rgdal dependency and protects examples with bru_safe_sp(), and the testthat tests with bru_safe_sp(force = TRUE) (that sets the evolution status to 2L if needed). If that works on the github tests, then it should also work on CRAN. But I had an experimental INLA version installed that github doesn't have (yet) so it's possible I need to get INLA updated with those experimental sp/sf/inlabru changes as well to not break user code.

finnlindgren commented 1 year ago

User code will still be unsafe since it's not protected by default; I'll likely add some calls to bru_safe_sp() in some key functions so it can warn about the issue on systems that lack rgdal in the future.

rsbivand commented 1 year ago

I would like to try to do another project report, for users with workflows and teaching materials rather than package maintainers. To start with, suggesting that they should run diffs on output with retiring packages present, then absent, to control for any regressions. But putting in warnings seems to nudge them into suppressing the warnings rather than reading them. We could even do an INLA blog for INLA users, if that might be helpful.

finnlindgren commented 1 year ago

When the CRS handling fails in inlabru/INLA user code, the effect can be much later in the code and/or with cryptic error messages that aren't always obviously related to the root cause. For example, if a technically valid crs is created but becomes NA instead of what it was meant to be, then the effect can be that it ends up evaluating covariates at the wrong location, possibly outside the domain, creating NA or NULL as data input, and tracing those kinds of errors in user can be extremely tedious and time consuming for me, as it tends to involve those who don't know how to debug themselves, so flagging the problem should ideally be done as early as possible.

I'd be ok with using stop() instead of warning() in some key functions to force users to make an active choice. One thing I wasn't expecting was that when simply loading an sp data set, with data(gorillas, package="inlabru") without rgdal installed, some CRS info gets lost (I believe it was gorillas$mesh$crs) or broken, so for most examples, the inlabru functions that might detect the potential sp/rgdal problem are called too late to flag the issue in interactive sessions. Doing stop() with an informative error message seems appealing to force the users to act seems appealing to me. I feel that the proper solution in the long run is for 2L to be the default evolution setting, and then it won't be an issue anymore. I'm inclined to treat the "sp evolution status 0L but no rgdal, breaking some CRS operations" case as a bug in sp that we're trying to work around in inlabru, to avoid having to deal with too much tedious user code debugging.

We'll update the example data sets to sf/terra, but that will likely need to be under different names, since the formats and applicable methods are so different.

There will definitely need to be a document for INLA users, but this will need to involve the fm_* wrapper methods that handle both sp, sf, and fmesher (which handles geometry that neither sp nor sf can handle). The sp-related wrappers may be the best place to detect the sp-issues, from inlabru's point of view. There are now some conversion functions between sf and fmesher, and more will be added (but some are fundamentally only sf->fmesher due to the limitations of sf (and sp; this is not a new issue)).

finnlindgren commented 1 year ago

An unfortunate side effect of working around the problem in the inlabru examples with bru_safe_sp() is that it partially masks the problem; automated package checking will succeed even on broken sp configurations (evolution status 0 and no rgdal). This is really a bug in sp, not inlabru.

finnlindgren commented 1 year ago

Option for the package data: rename the existing data like gorillas->gorillas_sp to allow existing code to run with only small changes, and have the new sf/terra versions use the current names, or call them like gorillas_sf, and really force code updates. Not sure what option is the most sensible.

finnlindgren commented 1 year ago

We are planning on converting the package examples/vignettes user-side code from sp to sf soon. Some already convert to sf.

rsbivand commented 1 year ago

From June 2023, sp will default to evolution status 2L, and Suggests: for retiring packages will be dropped in October 2023. You are however right that some users never update packages unless obliged to by R upgrades, so care will be needed until everybody is using R 4.4 after April 2024 (typically some years after that, but up to one year for Windows and macOS users). Stale RDA files in /data are a difficult problem, so you are right that care will be needed there too. May I link to this issue from the evolution repo?

finnlindgren commented 1 year ago

OK, June 2023 is very soon, so that's good! Then I feel ok with adding a few if (!bru_safe_sp()) stop("...") with suitable message in a few key places likely to trigger an alert to users with info about how to fix it.

Yes, feel free to link to here!

rsbivand commented 1 year ago

Maybe the message could suggest frequent updates of packages as a good habit (but of course recording package versions used in generating results from workflows)?

finnlindgren commented 1 year ago

Solved by #193

rsbivand commented 1 year ago

inlabru 2.8.0 CMD check OK for forthcoming sp 2.0 (status 2L) without retiring packages on library path, thanks!