klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
860 stars 48 forks source link

Auto-use (optionally) R's 'standard library'? #200

Closed mmuurr closed 3 years ago

mmuurr commented 3 years ago

Modules don't seem to already have R's 'standard library' (for the session, accessible via options()$defaultPacakges) attached ... but I think there's good reason to allow this (perhaps optionally with a switch/option). As a motivating example, take this basic module:

#' @export
foo <- xtabs # an object in 'stats'

... xtabs is not found, triggering an error what I try to box::use that module. Like most folks, I have a very difficult time knowing which object is part of which 'standard' package, since in 99% of R sessions those are all attached by default. Even worse, the logic of which object is in which package isn't always ... consistent, take this example about three common statistics-related functions: mean, runif, and sample (which I cribbed from my own post in https://github.com/wahani/modules/issues/13):

... which 'standard' command is in which 'standard' package. For example, is mean() in base? Is it in utils? Is it in stats (because the mean, after all, is a type of summary statistic)? (Spoiler, it's in base.) What about runif()? Sampling a random value uniformly between [0,1] is so basic to programming languages, it might be in base, right? Nope, it's in stats. Well, then it makes sense that sample() is probably in stats, too, right? Nope, it's in base. 🤦

I think an option to box::use all the 'standard library' packages would be nice (e.g. something like box::use(box::stdlib) or some other magic identifier). Polluting the namespace seems like very low risk since effectively all R sessions already have these objects attached, naming conflicts between those packages are non-existent, and one can still mask (e.g. one's own xtabs object) after that initial use.

Unlike the tidyverse meta-package example, R's standard library is quite stable and nearly universally expected to be available by R programmers (in my own experience, at least).

klmr commented 3 years ago

Pre 1.0, ‘box’ (then called ‘modules’) did in fact do this. Unfortunately it led to errors, since getOptions('defaultPackages') isn’t fixed.1

That’s why ‘box’ 1.0 has stricter semantics, which mirror those of R packages (enforced by R CMD check).

That being said, there’s a trade-off between strictness and the (reasonable) user expectation, which you’ve explained quite correctly. Maybe having a “stdlib” meta-import would be a good solution. I’m not too keen on using the suggested box::stdlib syntax, because I want to reserve :: for a potential future use. Maybe just box::use(.stdlib), since package names cannot start with a dot. On the other hand, this looks quite similar to ./stdlib, which would be a local module name.

The other question is what this would import: a hard-coded list of packages (‘datasets’, ‘utils’, ‘grDevices’, ‘graphics’, ‘stats’, ‘methods’[?]) or the current value of getOptions('defaultPackages')?

PS: Congrats on issue #200! 🎉


1 Actually pre-1.0 ‘modules’ did something worse, it just imported everything on the current search() path, i.e. it added all attached packages.

2 Well, actually packages do not have these semantics. Instead, packages are arguably even worse than ‘modules’ pre-1.0: packages import the base namespace, whose parent is .GlobalEnv. This means that if a package uses a function from a standard R package (such as xtabs from ‘stats’) but forgot to declare the corresponding import in its NAMESPACE, the package will instead search for this name in the global environment (and upwards). Anything loaded after ‘stats’ which redefines the xtabs function will get preferential treatment, including user-defined objects in the global environment.

By contrast, R CMD check enforces that all used names are declared in the NAMESPACE file.

mmuurr commented 3 years ago

Fair points on the naming of such a "meta-import" identifier ... what about R's commonly-used ~ formula notation (used quite extensively in the tidyverse for short anonymous function definitions) ... something like ~stdlib? (That does ultimately collide a bit with ~ being the user's home directory on UNIXy systems, but I think it's sufficiently distinct in the R world, especially when not quoted?)

As for the behavior, my first reaction was use getOptions("defaultPakages"), but now I'm giving pause as it relates to how packages currently are built (to my knowledge, which is admittedly less complete than yours).

Actually, this second scenario nudges me in the direction of the hard-coded solution of truly 'standard' R packages (the list of which is very slow-changing, so maintaining this shouldn't be much of a burden). If, as a module author, I want to deviate from that standard: Caveat Emptor, Danger, Will Robinson, Here be dragons! ... that's my risk to take and I'll need to be explicit about each standard R package my module accesses.

In any case, I really appreciate the conversation and the work you're doing, and I'm happy to continue to stress-test and use box!

klmr commented 3 years ago

what about R's commonly-used ~ formula notation

I’m potentially reserving that too, for “standard” evaluation calls (or rather, an escape hatch for NSE) — i.e. to maaaybe in the future allow something like

name = 'klmr/sys'
box::use(~ name)

Though I’m not sure I want to allow this (it makes static analysis impossible, and static analysis support is probably required to make ‘box’ usable with e.g. ‘targets’), or whether to rather use tidyeval’s !!/!!!.

Hmm. Maybe something like box::use('stdlib') (i.e. string-quoted)? Users might find this too surprising — but maybe they don’t?

mmuurr commented 3 years ago

Perhaps the inline-operator syntax: box::use(%stdlib%)? It's:

  1. recognizable to R users,
  2. sufficiently distinct from any package name,
  3. very likely distinct from any filename (because if anyone's naming their files like that, they deserve to have problems :-), and
  4. can be extracted from the AST using R's standard expression parsing toolkit, since it'd be identified as a token thanks to the surrounding %.

(But obviously easier said than done. With that said, if you need help or want any PRs, lemme know ... I'm pretty busy these days but can contribute when possible.)

klmr commented 3 years ago

… but:

  1. Not syntactically valid R. 😉

It would need backticks surrounding it, and that starts becoming quite syntactically convoluted: box::use(`%stdlib%`).

(I don’t want to discourage the brainstorming! It‘s great!)

Hmm, another possible operator we could abuse is prefix-?: box::use(?stdlib). Or just unary - or unary +.

mmuurr commented 3 years ago

Argh, valid syntax is now a requirement, too? (I jest.) I thought about the unary !, but that seems too close to being translated to "don't use this thing!"

Of the ?, -, and + options, I think + is the most natural, as one would be adding packages to be used.

klmr commented 3 years ago

I’m tempted to just create a meta-module r/core which would be shipped with ‘box’ with essentially the following contents:

#' @export
box::use(
    methods[...],
    stats[...],
    graphics[...],
    grDevices[...],
    utils[...],
    datasets[...]
)

Thoughts?

klmr commented 3 years ago

This is now implemented, and documented in the FAQ.

mmuurr commented 3 years ago

This is great, thanks!