rticulate / import

An Import Mechanism For R
https://import.rticulate.org
Other
222 stars 14 forks source link

Add import::config() function or similar? #63

Open torfason opened 2 years ago

torfason commented 2 years ago

As the number of dot-prefixed parameters to import::from() has increased considerably, I have been thinking that an import::config() function might be a good addition to the package. The formals (the declared function signature) of import::from() would not be changed, but any missing() parameters would be overridden by options set with something like:

import::config(.directory=here::here("utils"))
# or assuming import gets a new .S3 parameter
import::config(.character_only=TRUE, .S3=TRUE)

The option values themselves could be stored using base::options, but the rationale for having a separate function is that both findability and the ability to perform sanity checking on the parameter values would be improved.

Thoughts?

J-Moravec commented 2 years ago

I like that. Package options are often hidden and there is no easy way to explicitly document them. Creating a simple interface for package options is a nice way how to make them more explicit, together with a good way to provide documentation (i.e., that these options exist, in one place, without burying them elsewhere).

I was afraid that the logic would have to dictate that the parameters would have to default to NULL, but missing() seems to fix that. Although I haven't tested how missing() behaves after all that non-standard evaluation.

As an option to manage options, there is package pkgconfig: https://github.com/r-lib/pkgconfig

torfason commented 2 years ago

Although I haven't tested how missing() behaves after all that non-standard evaluation.

Good point, although most of the parameters that I think are appropriate for a config option are just regular variables:

from(
  .from,
  ...,
  .into = "imports",
  .library = [.libPaths](https://rdrr.io/r/base/libPaths.html)()[1L],
  .directory = ".",
  .all = ([length](https://rdrr.io/r/base/length.html)(.except) > 0),
  .except = [character](https://rdrr.io/r/base/character.html)(),
  .chdir = TRUE,
  .character_only = FALSE
)

Configuring the object names does not make sense, and I think .from should always be specified explicitly. That leaves .into but if my hunch on #53 is correct, we might be able to eliminate NSE from that as well.

Good pointer on pkgconfig as well. I had just been thinking about setting options() and using getOption("key", .default). I recently contributed [edit: I mistyped, I did not contribute that function, I just requested it, it was written by the author of that package, @jhrcook]) a config function to mustashe using that approach and it worked pretty well (see config_mustashe.R) – and your point related to minimizing dependencies in the #65 was a good one.

J-Moravec commented 6 months ago

I was working recently on a similar problem and managed to find/implement a set of function to hook into the options():

get_option <- function(x){
  name <- paste("import", x, sep = ".")
  value <- getOption(name)

  if(is.null(value))
    stop("Option: \"", x, "\" doesn't exist or have not been initialised yet.")

  value
}

set_option <- function(x, value){
  name <- paste("import", x, sep = ".")
  opt <- setNames(list(value), name)
  invisible(options(opt))
}

These two function hook into the options object using the options() or the quicker getOption() API and set/get options that are stored as import.option in the options(), but to user they appear as just option, simplifying the interactions, but making sure we are working on stuff with a unique prefix.

Since import is kind of common name, it can be made even more unique as pkgimport or anything else really, since this abstraction allow user to ignore this prefix entirely.

Additional helper functions can be:

get_options <- function(){
  opts <- options()
  opts <- opts[grep(names(opts), pattern = "import")]
  setNames(opts, sub("^import\\.", "", names(opts)))
}

set_default_options <- function(){
  set_option("option", "value")

  NULL
}

reset_options <- function(){
  opts <- names(get_options())
  for(opt in opts) set_option(opt, NULL)
}

Now that I am looking at this, perhaps set_option(option = value) might be preferable to set_option(option, value) since options will always be text and I don't believe special symbols like . will pose problem. Dunno why I wrote it like this at the time. Probably since it kept with the style of get_option(). The set_option(option = value) for could also accept multiple arguments if that is desired.

This is essentially a more dynamic version of what config_mustashe.R has that does not force to have a getter for every single option. Which isn't issue if we have only a small fixed amount of options.

Which style do we prefer?

torfason commented 6 months ago

I'm traveling, but glad to see some interest on this. I'd want to think on this together sometime after I come back (realistically sometime in March). In the meantime, it seems that guidance on how to store internal state such as this has evolved:

https://r-pkgs.org/data.html#sec-data-state

So my current preference would probably be to stay with one external function such as config() but not store the values using base::options() but in an internal environment (probably using the name the as suggested in the chapter. But it would be great to hear other thoughts and opinions.

J-Moravec commented 6 months ago

In that case, this is the example implementation mentioned in the article: https://github.com/r-lib/usethis/blob/175bf649163976351d972cedca6017517e087533/R/proj.R

I was experimenting with something of that sense (new env to store the state) but chose to hook on the global R options.

Now the hard question is: How do we handle interaction between arguments and globally set options?

Ideal behaviour is:

This means that instead of from = function(..., .S3 = TRUE), it would be either missing or absorbed wholly into ....

In the same way as many graphical options can be specified, but are typically documented only in par(). I am not sure I like that because it is obscuring potentially useful options and we don't have many of them.

Alternatively, if we have getter that the user can use in addition the the setter config(), we could have:

from = function(..., .S3 = getopt(".S3"))

This is the same method used occasionally with par(). See for instance polygon:

> polygon
function (x, y = NULL, density = NULL, angle = 45, border = NULL, 
    col = NA, lty = par("lty"), ..., fillOddEven = FALSE) 
{

It also means that the functions themselves do not need any magical mechanism to decide on the default value and everything is quite visible in function annotation, well except for the default value itself.

torfason commented 6 months ago

In that case, this is the example implementation mentioned in the article: https://github.com/r-lib/usethis/blob/175bf649163976351d972cedca6017517e087533/R/proj.R

I was experimenting with something of that sense (new env to store the state) but chose to hook on the global R options.

Yes, the global options were my first though as well. It was your comment about potential naming conflicts in a flat uncontrolled namespace (options starting with import.) that made me think that environments were a better option. I'd hate to have bug reports because something unrelated was setting the options.

Now the hard question is: How do we handle interaction between arguments and globally set options?

Ideal behaviour is:

  • if the option is not specified during the function call, use the global option.
  • if the option is set during the function call, use that option.

Agreed,

This means that instead of from = function(..., .S3 = TRUE), it would be either missing or absorbed wholly into ....

In the same way as many graphical options can be specified, but are typically documented only in par(). I am not sure I like that because it is obscuring potentially useful options and we don't have many of them.

Alternatively, if we have getter that the user can use in addition the the setter config(), we could have:

from = function(..., .S3 = getopt(".S3"))

This is the same method used occasionally with par(). See for instance polygon:

> polygon
function (x, y = NULL, density = NULL, angle = 45, border = NULL, 
    col = NA, lty = par("lty"), ..., fillOddEven = FALSE) 
{

It also means that the functions themselves do not need any magical mechanism to decide on the default value and everything is quite visible in function annotation, well except for the default value itself.

Yes, setting default to an analogue of par() would avoid the need for magic – but on the other hand, having the defaults explicitly documented in the roxydocs seems quite valuable. And there is so much magic (NSE) going on inside import::from() already, that adding just a tiny bit more seems OK 😊

And on inspection, it seems like it might not need to be too complex or magical. I tried a small proof of concept using just missing() and exists() and it seems to work pretty well:

the <- new.env(parent = emptyenv())

f <- function(x = "default") {
  if (missing(x) && exists("x", the)) {
    x <- the$x
  }
  x
}

# Value passed as parameter
f("parameter")
#> [1] "parameter"

# No value passed, no config, use global default
f()
#> [1] "default"

# No value passed, but value set in config
the$x <- "config"
f()
#> [1] "config"

It's a line or three for each parameter that can be configged , but I think it's probably worth it. What are your thoughts on this approach?

J-Moravec commented 6 months ago

When I see f = function(x = "default") I expect that the x will be default unless explicitly called otherwise e.g., f("foo").

The f = function(x = getpar("x")) feel significantly more explicit. Sure, the default values are not documented in the function definition, but it is documented where the default values are taken from. This would be the only change that would be required for from, into and here functions, no other parameter parsing would be necessary.

This would make it less obvious what the default parameters are, but more obvious how they are evaluated in the first place and thus how to reason about them.

Alternatively NULL also explicitly signal that the parameter is optional, but probably the default option will be guestimated. But in this particular case it feels like its combining the disadvantages of the missing() and getpar() options.

edit:

For the config() function itself, this behaviour might be desirable as it would allow the specification of only a particular option while documenting the defaults.

torfason commented 4 months ago

Apologies for the radio silence recently. I've given this some thought and my view is that we should strive to maintain backwards compatibility in function signatures (as a special case of the importance of maintaining backwards compatibility overall), unless there is a very compelling reason not to.

I appreciate that overloading the default parameters without explicit getpar() in the defaults does run a very slight risk of confusing where the value comes from – but only for someone who has already started to use the import::config() approach. For someone who knows nothing about import::config(), everything should always work as expected. I think such confusion could perhaps also be mitigated by showing a message() (possibly suppressable with an import::config(quiet = TRUE) option?) when a parameter is pulled from the options, although I don't have strong feelings on that part.

I'd be happy to work on integrating an import::config() function that takes this approach.

If @smbache has other ideas about what would be the best approach for configuring import or if a number of other community members would like to argue for another approach, I would be happy to consider other approaches as well.