Closed maelle closed 2 years ago
My 10 cents: yes!
I agree with @Robinlovelace. That being said some flexibility should be allowed given the state of flux with spatial, especially since raster
doesn't yet fully support sf
objects out of the box and requires coercing to sp
.
My :+1: to @jhollist
recommend_sf <- FALSE
if (!"raster" %in% pkg.deps)
recommend_sf <- TRUE
if (!recommend_sf){
# be careful
}
That condition could be modified through extending @noamross's fasterize
?
thanks for the feedback everyone! I forgot to tag @mdsumner 🤦♀️ @mdsumner, what's your opinion?
I also agree it's best at this point to recommend sf
, with the same caveats as @jhollist. There is a lot of the r world that is still firmly in the sp/rgal/rgeos
ecosystem though. Lately I've been making sf
the "first-class citizen" and include sp
in Suggests
, and support it by way of sf
's pretty great methods for coercion to/from.
To complicate matters just a bit, there is the beastly nature of sf
to consider. To borrow some code from @hrbrmstr:
library (fs)
library (tidyverse)
installed.packages() %>%
as_data_frame() %>%
mutate(pkg_dir = sprintf("%s/%s", LibPath, Package)) %>%
select(Package, pkg_dir) %>%
filter(Package %in% c ("sf", "sp")) %>%
mutate(pkg_dir_size = map_dbl(pkg_dir, ~{
fs::dir_info(.x, all=TRUE, recursive=TRUE) %>%
summarise(tot_dir_size = sum(size)) %>%
pull(tot_dir_size)
}) %>%
format(big.mark = ",")) %>%
select (Package, pkg_dir_size)
# Package pkg_dir_size
# sf 18,902,203
# sp 2,093,160
Recommending a package that is almost 10 times the size of its predecessor; that represents one of the largest installs in anybody's site-library
; and that by my reckoning is close to 20GB, by @hrbrmstr's more like 113GB, would suggest something like:
if (recommend_sf){
# be careful
}
ohai!
(MB not GB :-)
I was shocked when I saw the results of the lib dir sizes.
There's also this to consider:
sp
dep tree
vs
sf
dep tree
sf
brings a ton of heavy hitters along for the ride.
Thanks all! A lot of food for thought! 🤔 💥
Cannot compete with Bob's viz (awesome stuff - shows how interdependent we're all becoming which I recall you saying is a tendency to resist!) but just thought I'd flag that many of those pkgs will already be installed on many useRs computers, especially magrittr + Rcpp
given their popularity. Not sure about e1071
but I'd argue that uptake of units
is no bad thing (notwithstanding its potentially painful dependency on the C library udunits2):
Code: https://github.com/Robinlovelace/geocompr/blob/master/code/cranlogs.R but starting out with this:
pkgs = c("Rcpp", "MASS", "e1071", "units", "magrittr")
dd = cranlogs::cran_downloads(packages = pkgs,
from = "2013-01-01", to = Sys.Date())
I don't think it's possible to make a blanket recommendation for onboarding, sf is really not for package devs, it's just not that kind of tool.
Just wondering if it is worth re-framing the question somewhat: is there some approach that developers of packages working with spatial data (e.g., accessing a data repository or API that distributes spatial data) can do to make their packages more sf
-ready?
Or more generally, would it make sense to have some recommendations about spatial data standards in rOpenSci not necessarily tied to any particular package ecosystem? e.g. for starters, "packages working with spatial data should declare the projection used". Would it make sense to suggest something like geojson
as a preferred format for vector spatial data (unless the data needs one of those unusual simple features not part of the geojson standard?) Or is that unnecessarily prescriptive (i.e. users are better off always firing up the sf
/ libgdal
swiss-army knife and enjoy agnosticism to filetype?)
A few points:
1) I was fascinated when @mdsumner submitted a PR to remove sf dependency from a package of mine that does nothing but transform sf objects. It made me realize that the sf package is more like a reference implementation of this object type, along with a number of tools for dealing with it, and other geospatial package developers are building and modifying sf objects with custom code.
2) @jhollist's framed the issue well in his NLMR review, where he recommended authors have a plan to "future-proof" the package to deal with the changing geospatial ecosystem. I think we should include some guidance to authors and reviewers that they consider how their code will stay compatible with volatile parts of the R ecosystem. This would be a highly general top-level concept, (along with performance, documentation, etc.), but then we can provide some more specific guidance on areas such as geospatial, text, etc.
3) @cboettig I do think that those recommendations are a little too prescriptive, but I do think that recommendations for future-proofing/compatibility consider file format choice and sufficient storage of metadata to deal with changes in the future.
For the same of completeness here, here is a link to an issue thread where @edzer states:
the maintainer of
sp
has made it clear that new software should build uponsf
, notsp
. There is no longer a reason to continue building new software usingsp
classes.
Now that @rsbivand has retired, rgdal
, maptools
and rgeos
will retire as well, by the end of 2023. To do that, sp
will have to depend on sf
: right now sp
uses rgdal
and rgeos
for important tasks (validation of CRS, hole-in-polygon detection, although rgdal
and rgeos
are "just" in Suggests:
. sp
is not the way forward.
Any role for a maintenance-mode only sp going forward is to keep packages using spatial vector representations and unwilling to migrate to sf directly from breaking. For similarly placed packages using S4 classes inheriting from sp gridded classes, the picture is more complex, as stars and terra are arguably both useful alternatives, and should both be supported, but with no recommended preference (use cases should decide for themselves). So for vector data, yes, only recommend sf; for raster data, probably no call should be made beyond pointing out that raster's use of rgdal to access GDAL has EOL to the end of 2023 latest.
Thanks @edzer and @rsbivand for useful clarification! We'll update our recommended scaffolding section of our Development Guide to restate what you've clarified here. Much appreciated, and all the best to you @rsbivand for your days and years ahead! The entire R community will long remain deeply indebted to your contributions!
Thanks @mpadge ! The proposed scaffolding text looks well-balanced.
Ping our Associate Editor for Spatial Software, @Paula-Moraga, so you're aware of these updates :smile: Our automated checking system will notify whenever these (potentially) obsolete packages are used, so you'll see any such cases straight away.
Resolved by @mpadge's PR to the dev guide https://github.com/ropensci/dev_guide/pull/362
Thanks to all those who chimed in!
@jhollist @ateucher @Robinlovelace @mpadge cc @sckott
Dear geospatial+rOpenSci R people I can think of right now, do you think it would make sense for rOpenSci to recommend
sf
oversp
in the onboarding guidelines?Here are our current package recommendations, e.g. xml2 over XML.