klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
862 stars 48 forks source link

Modules with duplicate name as existing package on .libPaths() #201

Closed mmuurr closed 3 years ago

mmuurr commented 3 years ago

I couldn't find this in the documentation; perhaps it should be explicitly described how actual R packages are searched-for by box::use? Example: if I have a module (in the local directory) called "utils.r":

box::use(utils)  ## loads the _package_ utils
box::use(./utils)  ## loads the _module_ utils

What if we explicitly set the box.path option?

options(box.path = getwd())
box::use(utils)  ## still refers to the _package_, not the _module_, but needed explicitly test this to find out.

... "utils" is a likely the most common such example because so many of use have a "utils.R" file :-) In such cases the user needs to include the ./ path reference I think(?), though it gets murky when that user is relying on the box.path option ... I think if their box.path isn't easily accessible from their working directory, loading that "utils" module gets might tricky (without resorting to absolute file paths).

In any case, perhaps this should be added to the docs ("The search path and R packages")? (Unless of course I've simply missed this in the existing documentation, in which case, apologies for post!)

klmr commented 3 years ago

The box::use(name) syntax will never import a module: module names always need to be qualified. This is true for all modules, not just those that conflict with installed packages.

Part of the reason is to avoid accidentally and silently changing the semantics of code depending on whether or not a given package is installed. Another one is to discourage name conflicts (other languages which allow un-qualified module names have significant security problems with typosquatting).

As for documentation, this is already documented on the box::use usage page (under “Details”) but I’m happy to add this to a more general FAQ, which I should start working on.

mmuurr commented 3 years ago

Hm, I must be groking the docs incorrectly then. Take this scenario: a collection of modules living in a single directory somewhere on the filesystem apart from the current script, like so:

/some/absolute/path/to/boxlib
                       |-- a.R
                       |-- b.R
                       |-- utils.R

And now if we set options like so, options(box.path = "/some/absolute/path/to/boxlib"), it appears we cannot load "a.R" with either box::use(a) (as that looks for package "a"), nor box::use(./a) which searches in my current script's directory for the module.

In that case, I'm not sure I understand how the box.path option is best used?

I suppose it means one would set the path to "/some/absolute/path/to" then box::use(boxlib/a) (which works) but somehow this feels odd to me since the path isn't really the path ... it's one level above the 'library' root directory.

klmr commented 3 years ago

Yes, I will need to explain this better in the documentation.

The idea is to have qualified names. Think of it like the organisation on e.g. GitHub: org/modname; or user/modname. So rather than have box::use(boxlib/a) (which is presumably not portable, since every user’s path can be different), you’d use e.g. box::use(mmuurrr/a) or box::use(klmr/sys), and you’d set options(box.path = '/some/absolute/path/to/boxlib'), just as you initially had it.

For the moment, relatively few languages organise modules with qualified names, but it’s not unheard-of: Go for instance does the same, and (to a lesser extent) so do Java and C#.

mmuurr commented 3 years ago

Got it.

The use case I was thinking about here was a flat collection of modules (i.e. the 'boxlib' directory) shared by a few projects. The library would never contain any 'external' modules, so additional qualifying of the names wouldn't be needed (since there'd be no naming collisions by design). I suppose in this case a pretty workable solution for each project is to simply symlink boxlib and use the relative-filepath notation box::use(./boxlib/modname).

Or just insert some additional magic qualifier in the directory path, like LOCAL (i.e. LOCAL/modname).

Thanks!