klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
833 stars 47 forks source link

`box::use()` Fails for Modules Under Folders Whose Names Begin with Numbers #235

Closed GregYannes closed 2 years ago

GregYannes commented 2 years ago

Naturally, one should not name their module 1_number, since this module becomes its own environment in R, and no such R objects may have a name beginning with number. The same goes for any other names illegal in R.

However, if a local module one_number can be found along a relative filepath ./path/2/one_number.R, then box::use(./path/2/one_number) will fail:

Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) : 
  Expected identifier in module prefix, got 2
(inside “parse_mod_prefix(expr[[2L]])”)

This is likely a minor issue, but it could throw newcomers for a loop. While everything under the global box.path is presumably a module (nested or otherwise) unto itself, the same cannot be said of local filepaths, and their directories (which need not be modules) should not be bound to naming conventions for R objects. Indeed, one might be forced to rename the entire subdirectory system within one's project.

Is this worth a patch? Perhaps box::use("./path/2/one_number") should be possible, if the unquoted declaration would fail...

klmr commented 2 years ago

Every part of the fully qualified module name needs to be a valid R name. To fix the concrete issue, backtick-quote the non-syntactic name:

box::use(./path/`2`/one_number)

The same is true for module names themselves: you can call a module `1_number` — although it’s discouraged.

klmr commented 2 years ago

Regarding your other comment:

While everything under the global box.path is presumably a module (nested or otherwise) unto itself, the same cannot be said of local filepaths, and their directories (which need not be modules) should not be bound to naming conventions for R objects.

‘box’ attempts a delicate balancing act in that it attempts to be accommodating, but it is also intentionally opinionated1. The fact that fully-qualified names need to be syntactic R names is very much intentional. The local filepaths are intended to mirror R syntactic convention, rather than the other way round.

In fact, don’t think of fully-qualified module names as paths at all: think of them as nested R names. They happen to map onto the filesystem, but that’s almost an accident. And arbitrary paths are not intended to be used as modules. If you have a module that’s inside an arbitrary path (e.g. projects/2021-08-21-foo) that’s totally fine — but that prefix shouldn’t be part of the module name. Instead, you’re supposed to configure your box.path such that it searches inside that path. Consider the following file hierarchy:

.
└── User
    └── konrad
        └── projects
            └── 2021-08-21-foo
                └── foo
                    └── bar.r

The module here is foo/bar. That is its fully-qualified module name. You now have two choices:

  1. You’re inside the project 2021-08-21-foo and want to use foo/bar. So you use
     box::use(./foo/bar)

    (Note the prefix ./ which denotes a local module!)

  2. You’re outside the above project, and you want to reuse foo/bar. So you configure your box.path (either via options(box.path = '/User/konrad/projects/2021-08-21-foo') inside .Rprofile, or via an environment variable R_BOX_PATH=/User/konrad/projects/2021-08-21-foo), and then use
     box::use(foo/bar)

Incidentally, in the above, it must be foo/bar. bar alone won’t cut it (except inside a local project: ./bar is fine!). The reason is to avoid name clashes: the foo part is supposed to be a globally unique identifier. I recommend using user names similar to GitHub projects: all my reusable modules have the klmr/ prefix. If I ever manage to add a module distribution mechanism to ‘box’, this convention will be enforced.

Indeed, one might be forced to rename the entire subdirectory system within one's project.

Yes; but, as noted, only inside the module itself. The rest of the path can be arbitrary, since it shouldn’t form part of the module name prefix.


1 This isn’t necessarily obvious but opinionated APIs are almost universally less error-prone than un-opinionated APIs, for a variety of reasons. It’s related to the idea of the pit of success: doing the right thing when using an API should be so easy that it happens “by accident”, whereas making a mistake should be hard.

GregYannes commented 2 years ago

Hi Konrad,

Thanks as always for your thoughtful replies!

To fix the concrete issue, backtick-quote the non-syntactic name:

box::use(./path/`2`/one_number)

The same is true for module names themselves: you can call a module `1_number` — although it’s discouraged.

Good to know. I really should have thought to try it before bothering you, but the resulting discussion has been instructive!

The module here is foo/bar. That is its fully-qualified module name. You now have two choices:

  1. You’re inside the project 2021-08-21-foo and want to use foo/bar. So you use

    box::use(./foo/bar)

    (Note the prefix ./ which denotes a local module!)

  2. You’re outside the above project, and you want to reuse foo/bar. So you configure your box.path (either via options(box.path = '/User/konrad/projects/2021-08-21-foo') inside .Rprofile, or via an environment variable R_BOX_PATH=/User/konrad/projects/2021-08-21-foo), and then use
    box::use(foo/bar)

I understand and certainly appreciate your opinionated convention, which is well-conceived and implemented.

However, my use case is somewhat unique, since each project involves both approaches simultaneously. That is, each project has local modules and also access to global modules. The global modules are stored under a certain directory (to which I configure box.path) on a shared drive, and this directory has nested subdirectories of its own. So box.path is spoken for.

Now among these subdirectories are nested modules named according to semantic versioning: 0.0.0, 0.0.1, ..., 1.0.0, and so forth. Hence the motivation for raising this issue in the first place. Per naming conventions for R, I have renamed them v0.0.0, v0.01, ..., v1.0.0, and so forth. The result is perfectly functional.

Now the local modules are stored in a designated subdirectory (./Resources/Modules) of a project template (.), and while I expect them to remain static, the names of these subdirectories might eventually change. While I would love to designate the Modules subdirectory as the box.path for local purposes, it is already spoken for by the global modules. Furthermore, it would be a relative path, and thus contingent on the working directory. Since it is all too easy to forget to setwd() anew, it would be all too easy to keep referencing modules on the old path, in violation of the pit of success:

doing the right thing...should be so easy that it happens “by accident”, whereas making a mistake should be hard.

In summary, I really like your modular conventions for box. In my particular use case, however, both the global and local modules inevitably involve subdirectories (not necessarily all modules) whose names might change (incompatibly with R naming conventions), and box.path cannot simultaneously cover both global and local.

Perhaps there could be box.paths, a vector of multiple filepaths analogous to .libPaths()...

klmr commented 2 years ago

However, my use case is somewhat unique, since each project involves both approaches simultaneously. That is, each project has local modules and also access to global modules.

Actually that’s expected to be the normal use-case: as projects get larger, it is natural to have both external dependencies and a hierarchical internal structure — hence, global and local modules.

While I would love to designate the Modules subdirectory as the box.path for local purposes, it is already spoken for by the global modules.

box.path (and R_BOX_PATH) can hold multiple search paths! This isn’t really intended for project-local paths, but you can totally assign a vector of length > 1 to box.path (and this is documented 😉).

Furthermore, it would be a relative path, and thus contingent on the working directory.

Yes, in that case box.path is probably not appropriate (although you could set a suitable relative path inside your main project using box::file).

GregYannes commented 2 years ago

Actually that’s expected to be the normal use-case: as projects get larger, it is natural to have both external dependencies and a hierarchical internal structure — hence, global and local modules.

To be precise, I meant the whole situation was unique: including the nested subdirectories intervening between the top of the search path and the .R modules themselves, where the names of those directories might be syntactically invalid as the names of R objects. I do understand and appreciate the local/global dichotomy. 😊

you can totally assign a vector of length > 1 to box.path (and this is documented 😉).

Sorry! You shouldn't have to explain to me the things you've already documented with so much effort. I could have sworn the Search Path section wasn't there the last time I checked, but that's probably because I've been staring at a screen for many hours.

Furthermore, it would be a relative path, and thus contingent on the working directory.

Yes, in that case box.path is probably not appropriate (although you could set a suitable relative path inside your main project using box::file).

I'd like to make a helper use_local_modules() available in a module on the global path, so only an absolute box.path is needed, and then the helper can be used to access the local modules. The idea is to pass the ... to box::use() after temporarily reorienting the search path. It seems to be working so far, but I can't seem to find my way around the prefix/mod convention when I need to target an unnested module.R...

#####
#' @title Use Modules
#' @description Import modules (or packages) into the calling environment, as
#'   found along given search paths.
#' @export
#####
#' 
#' @param ... \href{https://klmr.me/box/reference/use.html#arguments}{Module
#'   import declarations} passed to
#'   \href{https://klmr.me/box/reference/use.html}{\code{box::use()}} in its
#'   specified
#'   \href{https://klmr.me/box/reference/use.html#module-names}{format}.
#' @param paths \code{character} vector. The
#'   \href{https://klmr.me/box/reference/use.html#search-path}{search path}s in
#'   which to search for the modules.
#' 
#' @return There is no return value; see
#'   \href{https://klmr.me/box/reference/use.html#value}{\code{box::use()}} for
#'   further details.
#####
use_modules <- function(..., paths = "./Resources/Modules") {
  # Sanitize arguments.
  paths <- normalizePath(path = as.character(paths))

  # Reset the 'box.path' afterwards.
  original_paths <- getOption("box.path")
  on.exit(expr = options(box.path = original_paths))

  # Import the modules from every path.
  for(path in paths[dir.exists(paths = paths)]) {
    # Temporarily orient 'box.path' to the targeted path.
    options(box.path = path)

    # Call 'box::use()' on the modular expression.
    eval.parent(expr = substitute(box::use(...)), n = 1)
  }
}