Open jarioksa opened 1 year ago
Sorry to chime in, wouldnt it be better to use terra?, in that case you could use both vector and raster formats? I would be happy to help if it is feasable
@derek-corcoran-barrios The sp to sf transition is made in branch sp2sf
. I think it may be done, but I haven't yet merged that branch to master
(I would still like to test it a bit). I don't know these packages too well, but I think (but may be wrong) that terra adds rasters, but does not replace sf for spatial point data. We would still need sf to handle spatial points if we want to keep the old behaviour of the package, and we should depend both on sf and terra. Currently Hmsc analysis and sampling (MCMC) is written to handle spatial point data, that is, sampling units.
There are two different developer scopes. The current (in sp2sf
branch) version replaces sp with sf so that the behaviour of Hmsc remains unchanged despite switching to new spatial infrastructure. I think your scope is different, and you want to extend the spatial framework to allow new kind of spatial analyses? Could you extend your ideas?
Hi @jarioksa, I am so happy that you replied, I will check it out. actually terra also handles point thata, if you have an SF object, you can check it out by just loading the terra package and using the function vect on an SF. My main line of thought in the case of the use of the terra package is that if you are able to predict on a raster, you could have all you variables (temperature, precipitation, etc) in a raster format, which is very very efficient and predict into it, which is far more efficient to both predict and store (as cloud optimized geotifs) than data frames and/or shapefiles. I would gladly talk about this some more with you if you think it is relevant, I work at Aarhus University, so I believe we are in the same time zone. This is very relevant to me. as I am planning to model all the Flora in Denmark using a joint species distribution model framework, and I think your package is the way to go, however it would optimize my workflow to do so with rasters from terra.
I actually just took a look at the branch, it seems to me that it would not take too long to make the change. What do you think
@derek-corcoran-barrios You seem to have more ambitious goals than we currently have in Hmsc. I assumed that you were thinking about prediction when you mentioned spatial rasters. Using spatial raster data in prediction will require more coding than we have now.
Hmsc has only a thin layer of spatial package code that protects the analytic core from spatial methods. The analytic core mainly handles only distance matrices and uses spatial coordinates in some special cases. The task of spatial package code is to handle possible spatial input and get those distances from georeferenced spatial data. In practice this only means that we use spherical distances with longitude-latitude input. For projected spatial data the distance functions both in sp and sf calculate Euclidean distances. The analytic core in sampleMcmc
does not know about spatial data. Spatial code is in pre-processing function HmscRandomLevel
, setPriors.HmscRandomLevel
, and computeDataParameters
before sampleMcmc
, and in prepareGradient
, constructGradient
, constructKnots
that prepare model for prediction, and inpredictLatentFactor
that is (optionally) used in prediction and is the only analytic function that was touched by spatial methods. You probably need more changes in prediction functions for rasters than we have implemented.
Please note that Hmsc really works with distance matrices instead of spatial coordinates. Spatial packages are needed for spherical distances with longitude–latitude data; projected data are treated as Euclidean both in sp and sf. In model specification via random levels we rely to spatial point data. We do not handle polygons or vectors. You can input spatial distances directly instead of spatial coordinates to calculate distances, but those distances must be metric or our matrix algebra will fail. Some polygon distances can violate the metric assumption and analysis will fail. It will fail, for instance, if you assume that two touching polygons have zero distance. Prediction beyond observed points, e.g. for full raster data, is still trickier. We will normally get a separate prediction corresponding each MCMC sample and you will get a separate raster for each sample. In full spatial model we used distance matrix among all observed points, and for prediction we then need distances among all raster cells – probably you need to use GPP models (Gaussian Predictive Process) with fixed knots. I see that terra package advertises efficient memory management for large data, but that probably means atomic data where each raster cell can be processed independent of other cells – unlike in our full spatial model.
You are welcome to develop spatial models for your needs, but this must be done without disrupting the main development. I am almost confident that the current sp2sf
branch is a drop-in replacement of the old sp code and can be merged to the master. This will fix our immediate problem and give code that works without retired package and behaves identically to the old code. I suggest you wait till I do that and then you can have a private branch and change it like you wish, including changing all my changes. Your changes can be merged as pull requests after review.
Finally a tabulation of major code changes from sp to sf based spatial functions in the current sp2sf
branch:
sp | sf |
---|---|
is(sData, "Spatial") | inherits(sData, "sf") |
coordinates(sData) <- colnames(sData) # to spatial | sData <-st_as_sf(sData,coords=colnames(sData),row.names=rownames(sData)) |
coordinates(sData) # from spatial | as.data.frame(st_coordinates(sData)) |
proj4string(sData) <- CRS("+proj=longlat") # set | st_crs(sData) <- "+proj=longlat" |
proj4string(sData) # get | st_crs(sData) |
is(sData, "Spatial") && is.projected(sData) | inherits(sData, "sf") && !st_is_longlat(sData) |
ncol(coordinates(sData)) | ncol(st_coordinates(sData)) |
nrow(coordinates(sData)) | nrow(st_coordinates(sData)) |
spDists(s) | st_distance(s) |
spDists(s, sKnot) | st_distance(s, sKnot) |
!is.projected(sData) | st_is_longlat(sData) |
as.data.frame(t(bbox(rL$s))) | as.data.frame(matrix(st_bbox(rL$s), 2, byrow=TRUE)) |
When loading
Hmsc
with current version of R we get the following message:We do not depend on any of these packages, but we depend on sp and that may use functions in these retired packages. The sp package is used with longitude-latitude data or cartographically projected data. I ran some quick tests with "evolution status 2", but found no problems. However, our test coverage of longitude-latitude or projected map data and only to test compatibility we would need new tests. The best thing is to change functions to use sf package before somebody crashes into problem, or CRAN demands us to make the change.
I will have a look at this issue.
It seems that we should have an update version by October 2023.
More information in blog text https://geocompx.org/post/2023/rgdal-retirement/