ropensci / statistical-software-review-book

Guide for development and peer-review of statistical software
https://stats-devguide.ropensci.org
42 stars 11 forks source link

Comments on proposed Spatial guidelines #82

Open gilbertocamara opened 4 months ago

gilbertocamara commented 4 months ago

Dear ROpenSci, congrats on your hard work on SSR! In what follows, we provide some comments on the SSR guidelines for R packaes that deal with spatial data, based on our experience on developing sits. We offer a concise overview of sits in SSR issue 81. For the record, the sits team has considerable experience on open source geospatial and GIS software, mostly in C++. In my case, count 45 years of work in the area.

The SSR Spatial Guidelines are very good and generally applicable. The emphasis on the sf package (SP2.1) is welcome. However, we missed guidelines on handling raster data. In our work, we found that terra to be better and easier to use than stars. In any case, SRR should discourage the use of raster as these packages have superseded it. We also missed guidelines regarding visualisation of vector and raster data. For your reference, we found that tmap, leaflet and leafem to be excellent packages. Note that both tmap and leafem require raster data to be handled by stars. In sits, we use stars as part of plotting and visualisation, and terra for access to raster values.

We suggest the inclusion of a guideline regarding the installation of GDAL and PROJ, following the instructions associated with the sf package. See more at https://r-spatial.github.io/sf/#installing. We also suggest that you consider mentioning the desirability of combining sfwith the tidyverse. As acknowledged by Edzer Pebesma, the design of sf has been influenced by the tidyverse to the extent that some functions for tidyverse can be applied to the output of sf ones. Thus, sf users will find it easier to combine it with the tidyverse.

In more general terms, we missed explicit support in the SSR Guidelines regarding the tidyverse. We understand that there is resistance to tidyverse in certain quarters. In our view, there are convincing arguments in favour of the tidyverse:

(a) The emergence of big data analytics requires reliable tools which are consistent. Many applications require major data wrangling and transformations, which is what the tidyverse provides. (b) Looking at CRAN packages with the most downloads dispel any doubts about the acceptance of the tidyverse. By the last count, about 5000 CRAN packages use dplyr. (c) While the earlier generation of R developers came from SPSS and similar, there is a growing number of R developers with a background on C/C++ and Python. Such contributors are at ease with the tidyverse than with the confusing world of *apply functions. (d) Increasingly, R developers are taking dplyr as their base model and extending it to specific domains. Arguably, sf is dplyr compatible. Consider also tidymodels and tidyquant.

In summary, SSR guidelines for Spatial data deserve recognition and praise. Please consider the above suggestions as coming from a team focused on Earth observation data, which does not cover the full scope of the SSR Spatial Guidelines.