klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
833 stars 47 forks source link

Feature Request: `box::get_script_path()` #239

Closed GregYannes closed 2 years ago

GregYannes commented 2 years ago

Issue

Per the documentation for box::set_script_path(), we can capture the previous script path when setting it anew:

Value

box::set_script_path returns the previously set script path, or NULL if none was explicitly set.

However, when I try to capture each previous path, I get the following output

# Set new path.
box::set_script_path("C:/Users/gyannes/Documents/Test.R")

# Set another new path and capture old.
test <- box::set_script_path("C:/Users/Public/Public Documents")
test
# [1] "C:/Users/gyannes/Documents"

# Set yet another new path and capture old.
test <- box::set_script_path(".")
test
# [1] "C:/Users/Public"

where the terminal filename (Test.R) or directory name (Public Documents) is "lopped off" each captured path.

Notably, this behavior ends at the root drive, which is not lopped off:

box::set_script_path("C:/Users")

test <- box::set_script_path("C:/")
test
# [1] "C:/"

test <- box::set_script_path(".")
test
# [1] "C:/"

Thus, valuable information can be lost whenever we change a path.

Acknowledgement

After perusing the source code, I see that this behavior is due to the intentional use of dirname(), and it is baked into the script_path_env$value as soon as the script path is set. This feature is doubtless tied to important functionality, so I obviously wouldn't request a massive rewrite.

However, the manual strongly indicates the filepath itself, rather than the path to the parent directory. A slight revision to the manual would greatly improve clarity.

Request

This issue arose while I was hacking together an augmentation for box. I always like to have a get*() for every set*(), so I put together a get_script_path() function in a utility module:

#####
#' @title Get Script Path
#' @description Helper to get the script path last set by
#'   \href{https://rdrr.io/cran/box/man/set_script_path.html}{box::set_script_path()}.
#' @export
#####
#' 
#' @param script_env The global
#'   \href{https://rdrr.io/r/base/environment.html}{environment} of the calling
#'   script.
#' 
#' @return A \code{character} string with the filepath to the calling script, or
#'   \code{NULL} when no script path has previously been set.
#####
get_script_path <- function(script_env = parent.frame(n = 1)) {
  # Capture the original script path by resetting it.
  # * Since 'box' is the only package on which this module relies, and since it
  #   has no 'box::get_script_path()', we must use this workaround.
  original_path <- evalq(
    # For the appropriate side effect, it is prudent to call
    # 'box::set_script_path(NULL)' as a literal command in the calling
    # environment.
    expr = box::set_script_path(NULL),
    envir = script_env
  )

  # For existing path, append a "dummy" terminus that will get "lopped off" from
  # the path as soon as it is set:
  #   https://github.com/klmr/box/blob/3d1644b8913657d236130271f340aef0c66fb4b7/R/paths.r#L18
  if(length(original_path) > 0) {
    new_path <- file.path(original_path, "dummy")
  }
  # Otherwise the path is missing ('NULL' or 'character(0)').
  else {
    new_path <- NULL
  }

  # Reset the script path to its original.
  on.exit(expr = {eval(
    # For the appropriate side effect, it is necessary for
    # 'box::set_script_path()' to evaluate in the calling environment; yet as if
    # it contained the literal string stored in 'new_path', a variable
    # inaccessible to the calling environment.
    #   'box::set_script_path("original/path/as/literal/string")'
    expr = substitute(box::set_script_path(new_path)),
    envir = script_env
  )})

  return(original_path)
}

However, this approach might prove unstable. Would it be possible to provide a box::get_script_path() function, which simply returns the character string currently being used as the script path?

Thanks, as always! — Greg

klmr commented 2 years ago

To do

Out of curiosity, what’s your use-case for box::set_script_path? My goal is that ‘box’ auto-detects virtually all paths so that box::set_script_path should almost never be necessary (it’s intended as a last resort).

klmr commented 2 years ago

Actually the documentation is correct, and this is a bug.

GregYannes commented 2 years ago

Hello again, Konrad! Thanks for your thoughtful responses!

Out of curiosity, what’s your use-case for box::set_script_path? My goal is that ‘box’ auto-detects virtually all paths so that box::set_script_path should almost never be necessary (it’s intended as a last resort).

Well, I'll tell you my tale of woe. The use case is a special API for my colleagues, which wraps much of box for our specific circumstances and company resources.

Specifically, I have developed a custom function called use_module(), which allows its user to target a specific directory (path = "path/to/parent") and use a module therein (my_module.R) by name alone (my_module) , without being forced to prepend a prefix:

use_module(my_module, path = "path/to/parent")

The function works by calling options(box.path = "path/to") and then box::use(parent/my_module), where parent is the necessary prefix to identify my_module as a module rather than a package. Naturally, box.path is reset on.exit(), back to its original value.

I have striven to remain highly aware of parsing conventions, for the argument(s) to box::use(), and it's all working well so far!

However, for modules located at the very root (C:/my_module), there is no filepath (no "path/to") with C:/ as its subdirectory; so there is no value to which I can set box.path before calling box::use(`C:`/my_module). I am attempting a workaround in use_module(): I call box::set_script_path("C:/") and then call box::use(./my_module), where . is an acceptable prefix to my_module (if my memory serves me well).

Now I have taken precautions against use_module() failing. As with my sample code for get_script_path(), the use_module() function will reset the script path (and box.path) to their original values on.exit(), even if use_module() encounters an error. However, to have that original value in the first place, use_module() needs get_script_path() functionality.

At the moment, the only way to access the script path is to set it anew, and then capture the (invisible) return value (the original_path) from box::set_script_path(); it must be reset to this original value before return(original_path). As seen in get_script_path(), I have been forced to first append a "dummy" subdirectory (original_path <- "path/to/parent/dummy") so that box::set_script_path(original_path) resets the script path to its original value in full ("path/to/parent") without accidental truncation ("path/to").

And that, my friend, is my sorry tale of woe.

klmr commented 2 years ago

And that, my friend, is my sorry tale of woe.

😆 … ahem. Sorry. It just seems to me that you could avoid a lot of problems, and make your own life easier, by just not storing modules at the very root of the filesystem hierarchy (in your case, C:\).

At any rate I’ll fix the currently broken return value of set_script_path (and I might even add a getter).

This will break your current code I’m afraid (which expects the last part of the path to be missing), I hope that won’t cause too much of an inconvenience.

GregYannes commented 2 years ago

Hello Konrad!

As always, I am impressed by your responsiveness and pride in your work! Regarding your last update:

It just seems to me that you could avoid a lot of problems, and make your own life easier, by just not storing modules at the very root of the filesystem hierarchy (in your case, C:\).

You are indeed correct. However, since my colleagues are not necessarily experienced with box, I want my API to be as intuitive as possible; so my use_module() must be capable of loading (as a module) a .R script stored in any arbitrary location...the root included.

At any rate I’ll fix the currently broken return value of set_script_path (and I might even add a getter).

Much appreciated! This will be a big help to me, and it should also be a major convenience for others. :)

This will break your current code I’m afraid (which expects the last part of the path to be missing), I hope that won’t cause too much of an inconvenience.

No worries! In constructing my API, I have striven to be as modular as possible (in the spirit of box). So all I need to do is update my helper functions get_local_path() and set_local_path(), to make them wrappers for script_path() and set_script_path(). Since all my other functions rely on these helpers, the updates will cascade painlessly!

Thanks again! — Greg