hmsc-r / HMSC

GNU General Public License v3.0
102 stars 37 forks source link

Spatial models must be updated for retiring packages #161

Open jarioksa opened 1 year ago

jarioksa commented 1 year ago

When loading Hmsc with current version of R we get the following message:

> library(Hmsc)
Loading required package: coda
The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0 

We do not depend on any of these packages, but we depend on sp and that may use functions in these retired packages. The sp package is used with longitude-latitude data or cartographically projected data. I ran some quick tests with "evolution status 2", but found no problems. However, our test coverage of longitude-latitude or projected map data and only to test compatibility we would need new tests. The best thing is to change functions to use sf package before somebody crashes into problem, or CRAN demands us to make the change.

I will have a look at this issue.

It seems that we should have an update version by October 2023.

More information in blog text https://geocompx.org/post/2023/rgdal-retirement/

derek-corcoran-barrios commented 10 months ago

Sorry to chime in, wouldnt it be better to use terra?, in that case you could use both vector and raster formats? I would be happy to help if it is feasable

jarioksa commented 10 months ago

@derek-corcoran-barrios The sp to sf transition is made in branch sp2sf. I think it may be done, but I haven't yet merged that branch to master (I would still like to test it a bit). I don't know these packages too well, but I think (but may be wrong) that terra adds rasters, but does not replace sf for spatial point data. We would still need sf to handle spatial points if we want to keep the old behaviour of the package, and we should depend both on sf and terra. Currently Hmsc analysis and sampling (MCMC) is written to handle spatial point data, that is, sampling units.

There are two different developer scopes. The current (in sp2sf branch) version replaces sp with sf so that the behaviour of Hmsc remains unchanged despite switching to new spatial infrastructure. I think your scope is different, and you want to extend the spatial framework to allow new kind of spatial analyses? Could you extend your ideas?

derek-corcoran-barrios commented 10 months ago

Hi @jarioksa, I am so happy that you replied, I will check it out. actually terra also handles point thata, if you have an SF object, you can check it out by just loading the terra package and using the function vect on an SF. My main line of thought in the case of the use of the terra package is that if you are able to predict on a raster, you could have all you variables (temperature, precipitation, etc) in a raster format, which is very very efficient and predict into it, which is far more efficient to both predict and store (as cloud optimized geotifs) than data frames and/or shapefiles. I would gladly talk about this some more with you if you think it is relevant, I work at Aarhus University, so I believe we are in the same time zone. This is very relevant to me. as I am planning to model all the Flora in Denmark using a joint species distribution model framework, and I think your package is the way to go, however it would optimize my workflow to do so with rasters from terra.

derek-corcoran-barrios commented 10 months ago

I actually just took a look at the branch, it seems to me that it would not take too long to make the change. What do you think

jarioksa commented 10 months ago

@derek-corcoran-barrios You seem to have more ambitious goals than we currently have in Hmsc. I assumed that you were thinking about prediction when you mentioned spatial rasters. Using spatial raster data in prediction will require more coding than we have now.

Hmsc has only a thin layer of spatial package code that protects the analytic core from spatial methods. The analytic core mainly handles only distance matrices and uses spatial coordinates in some special cases. The task of spatial package code is to handle possible spatial input and get those distances from georeferenced spatial data. In practice this only means that we use spherical distances with longitude-latitude input. For projected spatial data the distance functions both in sp and sf calculate Euclidean distances. The analytic core in sampleMcmc does not know about spatial data. Spatial code is in pre-processing function HmscRandomLevel, setPriors.HmscRandomLevel, and computeDataParameters before sampleMcmc, and in prepareGradient, constructGradient, constructKnots that prepare model for prediction, and inpredictLatentFactor that is (optionally) used in prediction and is the only analytic function that was touched by spatial methods. You probably need more changes in prediction functions for rasters than we have implemented.

Please note that Hmsc really works with distance matrices instead of spatial coordinates. Spatial packages are needed for spherical distances with longitude–latitude data; projected data are treated as Euclidean both in sp and sf. In model specification via random levels we rely to spatial point data. We do not handle polygons or vectors. You can input spatial distances directly instead of spatial coordinates to calculate distances, but those distances must be metric or our matrix algebra will fail. Some polygon distances can violate the metric assumption and analysis will fail. It will fail, for instance, if you assume that two touching polygons have zero distance. Prediction beyond observed points, e.g. for full raster data, is still trickier. We will normally get a separate prediction corresponding each MCMC sample and you will get a separate raster for each sample. In full spatial model we used distance matrix among all observed points, and for prediction we then need distances among all raster cells – probably you need to use GPP models (Gaussian Predictive Process) with fixed knots. I see that terra package advertises efficient memory management for large data, but that probably means atomic data where each raster cell can be processed independent of other cells – unlike in our full spatial model.

You are welcome to develop spatial models for your needs, but this must be done without disrupting the main development. I am almost confident that the current sp2sf branch is a drop-in replacement of the old sp code and can be merged to the master. This will fix our immediate problem and give code that works without retired package and behaves identically to the old code. I suggest you wait till I do that and then you can have a private branch and change it like you wish, including changing all my changes. Your changes can be merged as pull requests after review.

Finally a tabulation of major code changes from sp to sf based spatial functions in the current sp2sf branch:

sp sf
is(sData, "Spatial") inherits(sData, "sf")
coordinates(sData) <- colnames(sData) # to spatial sData <-st_as_sf(sData,coords=colnames(sData),row.names=rownames(sData))
coordinates(sData) # from spatial as.data.frame(st_coordinates(sData))
proj4string(sData) <- CRS("+proj=longlat") # set st_crs(sData) <- "+proj=longlat"
proj4string(sData) # get st_crs(sData)
is(sData, "Spatial") && is.projected(sData) inherits(sData, "sf") && !st_is_longlat(sData)
ncol(coordinates(sData)) ncol(st_coordinates(sData))
nrow(coordinates(sData)) nrow(st_coordinates(sData))
spDists(s) st_distance(s)
spDists(s, sKnot) st_distance(s, sKnot)
!is.projected(sData) st_is_longlat(sData)
as.data.frame(t(bbox(rL$s))) as.data.frame(matrix(st_bbox(rL$s), 2, byrow=TRUE))