Open jtlandis opened 1 year ago
This is a thoughtful feature request but it is at odds with the current, intentional design. There are two classes of problems with this design, which I’ll explain in the following. But first, here’s how globally installed (as opposed to project-local) modules are intended to be used:
Rather than have your module
foo
stored directly in the module search path, nest it inside a folder that represents an organisation, product or user name — exactly like a GitHub project. That is, you could use the path/opt/box/jtlandis/foo
and you would import it viabox::use(jtlandis/foo)
. This is commonly known as “namespaced packages” or “scoped packages”. All global modules are supposed to be scoped.
R packages are not scoped, and this has several problems. Foremost, it leads to name clashes between different sources. In the case of R, CRAN packages always take precedence. This isn’t just a theoretical problem: for years, ‘box’ used to be only hosted on GitHub, and it had to be renamed three times (‘modules’ → ‘mod’ → ‘pod’ → ‘box’) because successive packages on CRAN “stole” its name. There are other, arguably more serious problems (e.g. it facilitates security vulnerabilities via typosquatting of module names).
On balance, I feel that the pros of requiring scoped module names outweigh the cons. But I am open to counter-arguments. (For a broader perspective, the issue of scoped vs. flat module names is a extremely widely debated issue across module systems in many different languages — e.g. here — and while the debate is far from settled most people seem to agree that having scope would generally be better than not having them, and that retrofitting them into a flat system is problematic.)
The other issue with your proposal is that it unfortunately introduces an error-prone API. Once again the culprit is name clashes: consider the case where, in the future, you install an R package ‘bar’ on your computer. Installing ‘bar’ automatically installs its dependencies, and one of the dependencies is called … ‘foo’.
And suddenly your code breaks, without any changes to it. Simply because (indirectly) installing the package ‘foo’ has changed what code gets loaded by box::use(foo)
. Even worse, if you are really unlucky the code will seem to continue to work, because both the module and the package might define a function of the same name that is called in your code, but which does something slightly different. The code still runs, but it produces wrong results.
This kind of silent errors is something that ‘box’ aims to avoid at any cost. In other words, if there will ever be the possibility of using flat module names, they won’t be usable simply via box::use(modname)
; there needs to be some syntactic distinction from package names.
Would you be open to expanding the box syntax to indicate a global module vs a package name? the formula operator ~
is currently unused and may be helpful in this regard that we are searching the global search paths (and not a package).
Maybe something like:
box::use(~foo, foo/mod1)
mod1$bar
# 1
foo$bar
# 2
I understand the intention behind wanting some projects namespaced, I just enjoy being able to use the __init__.r
script and it feels odd to me that we cannot use these on scoped projects.
I'll leave the final decision in your hands, but if you like the idea, I can revise the pull request to try and implement this feature.
I just enjoy being able to use the init.r script and it feels odd to me that we cannot use these on scoped projects.
I’m not sure what you mean by that: you can use __init__.r
scripts, they’re just regular modules (where the module name is the parent folder’s name). The only requirement is that they are nested inside the a parent namespace; but the same is true for all modules, regardless whether they are implemented in a file called __init__.r
or something else. Using the example from above, you could have a module implemented in the file /opt/box/jtlandis/foo/__init__.r
or /opt/box/jtlandis/foo.r
.
That being said, I am not dogmatically opposed to “flat” module names. Maybe I need to revisit my objection to them. As for syntax, I’m unsure what would be best. ~mod
works but feels arbitrary. Ideally I would like to have a syntax that is descriptive, in that even somebody not familiar with it can make an educated guess at the meaning (or at the very least to facilitate remembering the correct syntax).
I have no definitive answer but here are a few other ideas that feel like they might make more sense (in no particular order). Apologies, unfiltered stream of consciousness ahead:
~mod
.../mod
../mod
path, which goes up one level … so .../mod
goes up “all the way to the root”; better yet would be /mod
, of course, but that is not valid R''/mod
/mod
idea and makes it syntactically valid by putting “something empty” in front of /
'global'/mod
global::
scope resolution operator for namespaces, which are closely analogous to modules(foo)
box::use(mod)
and box::use((mod))
are visually too similar{foo}
{}
as a Rust-like shorthand for nested imports, i.e. box::use(a/{b; c})
would be the same as box::use(a/b, a/c)
mod/.
mod
while staying with the current idea of using path syntaxmod
; giving it a different interpretation here might be confusing. — But: it’s actually not a different interpretation! In fact, the “module location” specified by mod/.
is identical to mod
; the only thing that changes is to disambiguate the specification from a package name./mod
?mod
~
+mod
, -mod
, !mod
!
will probably be used in the future to exclude names from being attached, see #287)global(mod)
global:mod
(or similar)
prefix:mod
syntax is earmarked for specifying remote sources, e.g. GitHub, similar to the ‘Remotes’ syntax in R DESCRIPTION
files::
; so it should/could be github::mod
, leaving :
freeglobal
could become a remote type just like github
, s3
, etc.; see #249global@mod
(or similar)
@
is earmarked for version specifications in a potential future version of ‘box’ so that users could select a package/module version via e.g. mod@'1.0.0'
(#28)~
above), leaving @
free… of course this is non-exhaustive; one of the dangers of using NSE is that you can go wild with the syntax.
One more consideration: I’ve used global
in the above to denote globally installed modules, in contrast to project-local modules with a relative path. Alternatively, it might more sense to use verbatim-mod
to denote that this loads a module rather than a package.
… it’s fair to say that I’ve painted myself a bit into a corner, syntactically, by making box::use(pkg)
load a package.
My logic behind using ~
is that for file paths it implies relative to the users home directory, so to me it doesn't seem like too far of a leap for it to imply "relative to global paths", but it may feel arbitrary to others.
I’m not sure what you mean by that: you can use init.r scripts, they’re just regular modules (where the module name is the parent folder’s name). The only requirement is that they are nested inside the a parent namespace; but the same is true for all modules, regardless whether they are implemented in a file called init.r or something else. Using the example from above, you could have a module implemented in the file /opt/box/jtlandis/foo/init.r or /opt/box/jtlandis/foo.r.
I understand how the __init__.r
scripts work. I mean to express that the namespace itself can hold a __init__.r
script but cannot be called by just the parent folder's name, hence the motivation behind me opening this issue or my last suggestion with ~
. It seems that you want users to structure their global modules a certain way. I'm here to tell you that I would like the option to call a global module's __init__.r
script without the need of the following syntax...
box::use(jtlandis/`__init__`)
I hope this clears things up.
If this feature request does not make it into the package I am also okay with that. I can always restructure my code.
I hope this clears things up.
It does, thanks.
I still haven't decided how to best tackle this feature request but I will add some way of using unscoped modules.
Apologies for the radio silence, it was a combination of life happening and me having issues with my computer which lead to my dev environment crashing constantly.
Anyway, please have a look at the work in progress PR #316 (branch feature/unscoped-mod-names) which implements a preliminary version of this feature.
For now (and I need to stress that this is temporary!) the syntax for loading an unscoped module global_module
is as follows:
box::use(mod(global_module))
Here, mod()
works as a disambiguator between module and package name. So it does not indicate that the module is unscoped/global, it indicates that what follows is, in fact, a module. The same syntax will also work with regular, scoped module names (in which case it would be redundant, however).
Before settling on a final syntax and merging this feature I would appreciate test-driving this feature.
I understand life - I too have been very busy. I will give this feature a go in my day to day. Cheers!
Please describe your feature request
Lets assume you have set the systems
R_BOX_PATH
environment variable to something like/opt/box
. It would be convenient to take advantage of the__init__.r
usage pattern with modules within/opt/box
.For example, suppose the following directory structure in
/opt/box
foo/__init__.r
foo/mod1.r
foo/mod2.r
Then, the expected result may look something like this:
Instead, we currently get an error dictating that there is no package named
foo
. Thus back to my point, to there being some convenience ifbox
would then start searching the paths.I am aware that if
R_BOX_PATH
had/opt
set, then the following usage may workBut in my case, there may be other things within
/opt
that I do not want thebox
package to walk over.