rsbivand / rgrass

Interpreted interface between the GRASS geographical information system and R
https://rsbivand.github.io/rgrass/
26 stars 9 forks source link

initGRASS() and cross-platform reproducibility #29

Closed florisvdh closed 3 years ago

florisvdh commented 3 years ago

This issue proposes an idea I've been thinking of since some time - if useful, it might fit in a possible package upgrade to support GRASS 8 (#28).

Currently, initGRASS() has a required gisBase argument (e.g. in a Linux setup with GRASS 7.8, this would often be /usr/lib/grass78). However hardcoding this path in a script makes the script unfit for cross-platform reproducibility (or for recycling a script for future usage), given the various versions and modes of GRASS installation. Currently I try to overcome this with something like gisbase_grass <- ifelse(.Platform$OS.type == "windows", link2GI::paramGRASSw()$gisbase_GRASS[1], link2GI::paramGRASSx()$gisbase_GRASS[1]) - but I don't expect miracles either: it seemed difficult also in link2GI to cope with all possible situations and GRASS versions, since things continue to evolve.

Idea: allowing the gisBase argument to be missing, in which case initGRASS() would check for the existence of an environment variable GISBASE_GRASS (not to be confused with the GISDBASE variable) and take its value. This environment variable would need to be set by the user if (s)he wishes to implement such feature, e.g. in a user level .Renviron file (i.e. not as a part of a shared R project). So this would be an optional route avoiding to hardcode the lib path in an R script, while on the other hand this script could only run if a user has set the environment variable (if not, initGRASS() will error but could explain the possibility in the error message).

veroandreo commented 3 years ago

Well, you don't really need to hardcode it, right? The following works in different systems and then, the user does not need to set an environment variable at all. Would you be able to grab that somehow in your workaround?

# OSGeo4W, Linux, Mac OSX users:
grass --config path

IMHO, relying on an environment variable that must be set by the user (think normal users and new comers) to get gisBase is not very practical.

source: https://grasswiki.osgeo.org/wiki/R_statistics/rgrass7

florisvdh commented 3 years ago

@veroandreo thanks for the hint! That would be doable indeed (using system() or shell()), but there's also a limitation: it needs grass to be present in the system PATH. The latter is not the case for at least Windows users with standalone GRASS, while standalone seems the easiest GRASS installation in Windows to get working with rgrass7, i.e. not needing to launch R/RStudio from OSGeo4W shell.

In my suggestion, allowing an environment variable (that presets gisBase) would be additional to current initGRASS() behaviour, initGRASS() would not insist on using it. Its presence would be checked and its value taken, only when gisBase is missing. It could be defined here e.g.:

$ cat ~/.Renviron 
R_LIBS_USER="~/lib/R/library"
GISBASE_GRASS="/usr/lib/grass78"
rsbivand commented 3 years ago

I agree with @veroandreo that an environment variable, especially set in say .Renviron, feels fragile. I think that this is "too" simple. After all, users benefit from having to establish were software is stored. I'm open to the argument that a back-door of this kind might be useful for say a larger lab, where an admin sets up both GRASS, the environment variables, etc., but regular users should learn to manage their own systems. If they don't learn, any upgrades will break their workflows.

florisvdh commented 3 years ago

I understand that one overall setting (.Renviron) is not in scope for the usecase of e.g. multiple GRASS installations on one machine. I just considered it an option for more simple GRASS installs indeed.

regular users should learn to manage their own systems. If they don't learn, any upgrades will break their workflows.

I follow that too. The option of using the environment var instead of hardcoding gisBase would still be something that a user controls, and hence, must understand.

But it was just an idea, no problem to drop it.

IMHO an approach where users can maintain their own system settings outside of their R code is interesting, especially for collaborative environments - as you say. But of course we might still implement the env var suggestion locally and assign such an env var value to gisBase, in an R script.

Feel free to close!

florisvdh commented 3 years ago

An alternative could be to use options() (which a user could automate with .Rprofile). Cf. https://github.com/paleolimbot/qgisprocess/issues/55#issuecomment-870510895.

paleolimbot commented 3 years ago

Just as a note here...@florisvdh is 100%. The standard for me when I'm wrapping something that might require a user-specific setup is to make it possible to specify the option as an options() (e.g., a system-specific or project-specific .Rprofile) OR an environment variable (e.g., so that contributors can get CMD check to pass on their machines where setting options() is hard or impossible). This is a one-liner in practice and makes it a lot easier to write portable code:

getOption("pkg.some_config", Sys.getenv("R_PKG_SOME_CONFIG", "default-value"))