Search for modules after packages has been exhausted

Please describe your feature request

Lets assume you have set the systems R_BOX_PATH environment variable to something like /opt/box. It would be convenient to take advantage of the __init__.r usage pattern with modules within /opt/box.

For example, suppose the following directory structure in /opt/box

foo/__init__.r

#' @export
box::use(./mod1[...], ./mod2[...])

foo/mod1.r

#' @export
bar <- 1L

foo/mod2.r

#' @export
baz <- 2L

Then, the expected result may look something like this:

box::use(foo)

foo$bar # 1L
foo$baz # 2L

Instead, we currently get an error dictating that there is no package named foo. Thus back to my point, to there being some convenience if box would then start searching the paths.

I am aware that if R_BOX_PATH had /opt set, then the following usage may work

box::use(box/foo)

But in my case, there may be other things within /opt that I do not want the box package to walk over.

This is a thoughtful feature request but it is at odds with the current, intentional design. There are two classes of problems with this design, which I’ll explain in the following. But first, here’s how globally installed (as opposed to project-local) modules are intended to be used:

Rather than have your module foo stored directly in the module search path, nest it inside a folder that represents an organisation, product or user name — exactly like a GitHub project. That is, you could use the path /opt/box/jtlandis/foo and you would import it via box::use(jtlandis/foo). This is commonly known as “namespaced packages” or “scoped packages”. All global modules are supposed to be scoped.

R packages are not scoped, and this has several problems. Foremost, it leads to name clashes between different sources. In the case of R, CRAN packages always take precedence. This isn’t just a theoretical problem: for years, ‘box’ used to be only hosted on GitHub, and it had to be renamed three times (‘modules’ → ‘mod’ → ‘pod’ → ‘box’) because successive packages on CRAN “stole” its name. There are other, arguably more serious problems (e.g. it facilitates security vulnerabilities via typosquatting of module names).

On balance, I feel that the pros of requiring scoped module names outweigh the cons. But I am open to counter-arguments. (For a broader perspective, the issue of scoped vs. flat module names is a extremely widely debated issue across module systems in many different languages — e.g. here — and while the debate is far from settled most people seem to agree that having scope would generally be better than not having them, and that retrofitting them into a flat system is problematic.)

The other issue with your proposal is that it unfortunately introduces an error-prone API. Once again the culprit is name clashes: consider the case where, in the future, you install an R package ‘bar’ on your computer. Installing ‘bar’ automatically installs its dependencies, and one of the dependencies is called … ‘foo’.

And suddenly your code breaks, without any changes to it. Simply because (indirectly) installing the package ‘foo’ has changed what code gets loaded by box::use(foo). Even worse, if you are really unlucky the code will seem to continue to work, because both the module and the package might define a function of the same name that is called in your code, but which does something slightly different. The code still runs, but it produces wrong results.

This kind of silent errors is something that ‘box’ aims to avoid at any cost. In other words, if there will ever be the possibility of using flat module names, they won’t be usable simply via box::use(modname); there needs to be some syntactic distinction from package names.

Would you be open to expanding the box syntax to indicate a global module vs a package name? the formula operator ~ is currently unused and may be helpful in this regard that we are searching the global search paths (and not a package).

Maybe something like:

box::use(~foo, foo/mod1)

mod1$bar
# 1
foo$bar
# 2

I understand the intention behind wanting some projects namespaced, I just enjoy being able to use the __init__.r script and it feels odd to me that we cannot use these on scoped projects.

I'll leave the final decision in your hands, but if you like the idea, I can revise the pull request to try and implement this feature.

I just enjoy being able to use the init.r script and it feels odd to me that we cannot use these on scoped projects.

I’m not sure what you mean by that: you can use __init__.r scripts, they’re just regular modules (where the module name is the parent folder’s name). The only requirement is that they are nested inside the a parent namespace; but the same is true for all modules, regardless whether they are implemented in a file called __init__.r or something else. Using the example from above, you could have a module implemented in the file /opt/box/jtlandis/foo/__init__.r or /opt/box/jtlandis/foo.r.

That being said, I am not dogmatically opposed to “flat” module names. Maybe I need to revisit my objection to them. As for syntax, I’m unsure what would be best. ~mod works but feels arbitrary. Ideally I would like to have a syntax that is descriptive, in that even somebody not familiar with it can make an educated guess at the meaning (or at the very least to facilitate remembering the correct syntax).

I have no definitive answer but here are a few other ideas that feel like they might make more sense (in no particular order). Apologies, unfiltered stream of consciousness ahead:

~mod
- 👍 concise
- 👎 arbitrary, suggests something to do with NSE or lambda notation (similar to ‘rlang’ lambdas)
- 👎 should be reserved for version specification (see Poetry: dependency specification#Tilde requirements)
.../mod
- 👍 kind of a logical extension of the relative ../mod path, which goes up one level … so .../mod goes up “all the way to the root”; better yet would be /mod, of course, but that is not valid R
- 👎 looks like a relative (= project-local) import rather than a global module
''/mod
- 👍 follows the /mod idea and makes it syntactically valid by putting “something empty” in front of /
- 👎 strings are valid nowhere else in module names, and this will stay like this; I am not actually considering this
'global'/mod
- 👍 arguably quite obvious what is meant
- 👎 strings again
- 👎 despite being obvious it doesn’t really make sense in the path syntax metaphor; but C# actually uses the global:: scope resolution operator for namespaces, which are closely analogous to modules
(foo)
- ℹ️ using parentheses to disambiguate the type of an expression has precedents in other languages
- 👎 pretty cryptic
- 👎 error-prone: box::use(mod) and box::use((mod)) are visually too similar
{foo}
- 👎 I want to reserve {} as a Rust-like shorthand for nested imports, i.e. box::use(a/{b; c}) would be the same as box::use(a/b, a/c)
mod/.
- ℹ️ this is currently rejected by ‘box’ so the syntax is unambiguous and could be used without breaking backwards compatibility
- 👍 this is probably the closest we can get to pure mod while staying with the current idea of using path syntax
- ℹ️ in filesystems this notation is (almost?) universally interpreted as identical to mod; giving it a different interpretation here might be confusing. — But: it’s actually not a different interpretation! In fact, the “module location” specified by mod/. is identical to mod; the only thing that changes is to disambiguate the specification from a package name
- 👎 might be error-prone due to visual similarity to ./mod
?mod
- 👎 just as arbitrary as ~
- 👎 looks like it would be used to load the module documentation
+mod, -mod, !mod
- ℹ️ ok, we have more unary prefix operators
- 👎 arbitrary and non-descriptive
- 👎 suggests different semantics (and at least ! will probably be used in the future to exclude names from being attached, see #287)
global(mod)
- 👍 fairly obvious meaning
- 👎 subjectively, this still feels weird; I can’t really marshal on objective argument against it at the moment
global:mod (or similar)
- 👎 the prefix:mod syntax is earmarked for specifying remote sources, e.g. GitHub, similar to the ‘Remotes’ syntax in R DESCRIPTION files
- 🤝 but actually the ‘remotes’ syntax uses ::; so it should/could be github::mod, leaving : free
- 🤝 alternatively, the syntax could be unified; that is, global could become a remote type just like github, s3, etc.; see #249
global@mod (or similar)
- 👎 @ is earmarked for version specifications in a potential future version of ‘box’ so that users could select a package/module version via e.g. mod@'1.0.0' (#28)
- 🤝 actually version specification might have to use relational operators (see also ~ above), leaving @ free

… of course this is non-exhaustive; one of the dangers of using NSE is that you can go wild with the syntax.

One more consideration: I’ve used global in the above to denote globally installed modules, in contrast to project-local modules with a relative path. Alternatively, it might more sense to use verbatim-mod to denote that this loads a module rather than a package.

… it’s fair to say that I’ve painted myself a bit into a corner, syntactically, by making box::use(pkg) load a package.

My logic behind using ~ is that for file paths it implies relative to the users home directory, so to me it doesn't seem like too far of a leap for it to imply "relative to global paths", but it may feel arbitrary to others.

I’m not sure what you mean by that: you can use init.r scripts, they’re just regular modules (where the module name is the parent folder’s name). The only requirement is that they are nested inside the a parent namespace; but the same is true for all modules, regardless whether they are implemented in a file called init.r or something else. Using the example from above, you could have a module implemented in the file /opt/box/jtlandis/foo/init.r or /opt/box/jtlandis/foo.r.

I understand how the __init__.r scripts work. I mean to express that the namespace itself can hold a __init__.r script but cannot be called by just the parent folder's name, hence the motivation behind me opening this issue or my last suggestion with ~. It seems that you want users to structure their global modules a certain way. I'm here to tell you that I would like the option to call a global module's __init__.r script without the need of the following syntax...

box::use(jtlandis/`__init__`)

I hope this clears things up.

If this feature request does not make it into the package I am also okay with that. I can always restructure my code.

I hope this clears things up.

It does, thanks.

I still haven't decided how to best tackle this feature request but I will add some way of using unscoped modules.

Apologies for the radio silence, it was a combination of life happening and me having issues with my computer which lead to my dev environment crashing constantly.

Anyway, please have a look at the work in progress PR #316 (branch feature/unscoped-mod-names) which implements a preliminary version of this feature.

For now (and I need to stress that this is temporary!) the syntax for loading an unscoped module global_module is as follows:

box::use(mod(global_module))

Here, mod() works as a disambiguator between module and package name. So it does not indicate that the module is unscoped/global, it indicates that what follows is, in fact, a module. The same syntax will also work with regular, scoped module names (in which case it would be redundant, however).

Before settling on a final syntax and merging this feature I would appreciate test-driving this feature.

I understand life - I too have been very busy. I will give this feature a go in my day to day. Cheers!

klmr / box

Search for modules after packages has been exhausted #307

Please describe your feature request