klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
829 stars 47 forks source link

Consider making ‘box’ namespace environments actual namespaces #341

Closed klmr closed 8 months ago

klmr commented 9 months ago

‘box’ module “namespace” environments are currently not actually namespaces as far as R is concerned (i.e. they don’t have a .__NAMESPACE__. environment with a nested spec name of type string).

We should explore what consequences changing this would have. See:

https://github.com/klmr/box/blob/2cef22e63abeedefa82e45b12f3f584301988733/R/env.r#L21-L22

On the positive side, it would lead to a nicer printing of the closure environment of a function defined inside a module. On the flip side, there might be rather a lot of unintended, subtle side-effects that current tests don’t capture, and which could break user code.

radbasa commented 8 months ago

This may break a PR we have for box support in the covr package. https://github.com/r-lib/covr/pull/526

klmr commented 8 months ago

Ah, thanks for letting me know. I think it should still work, but at any rate now I know about it and will pay attention to not breaking it.

Also, great work on the ‘covr’ PR!

radbasa commented 8 months ago

Thanks! If you do change to namespace environments, I'll also take a look in the covr side.

The covr PR, however, still has one CI fail with oldrel-4. box fails to install. I think it's related to the R-3.6 backports fixed in 9c47553.

ArcadeAntics commented 8 months ago

As someone who's tried this before in a personal R package, this is a bad idea.

Namespaces are expected to have a consistent structure that package:box does not follow. A namespace has a parent environment with a name imports:pkgname whose parent environment is the base namespace, which box namespaces do not follow. If you were to make the box namespaces into "real" namespaces, this means any closures written in a box namespace cannot be compiled, and any loops cannot be compiled either (see ?compiler::enableJIT).

You would also run into the issue that a namespace is expected to originate from a directory with sub-directories help, Meta, R, possibly others such as html and libs, as well as files DESCRIPTION, INDEX, MD5, and NAMESPACE. box modules do not offer such a file structure, so they should not pretend to be package namespaces.

Then comes the issue of namespace registration. Namespaces need to be registered, and you can register them, technically; here is some code at the C level:

SEXP do_regBoxNS(SEXP call, SEXP op, SEXP args, SEXP rho)
{
    args = CDR(args);
    SEXP name = CAR(args); args = CDR(args);
    switch (TYPEOF(name)) {
    case SYMSXP:
        break;
    case STRSXP:
        if (LENGTH(name)) {
            name = installTrChar(STRING_ELT(name, 0));
            break;
        }
    default:
        errorcall(call, "bad package:box namespace name");
    }
    SEXP boxNS = CAR(args); args = CDR(args);
    if (findVarInFrame(R_NamespaceRegistry, name) != R_UnboundValue)
        errorcall(call, "package:box namespace already registered");
    defineVar(name, boxNS, R_NamespaceRegistry);
    return R_NilValue;
}

static const R_ExternalMethodDef externalRoutines[] = {
{".C_regBoxNS", (DL_FUNC) &do_regBoxNS, 2},
{NULL, NULL, 0}
}

attribute_visible
void R_init_box(DllInfo *dll)
{
    R_registerRoutines(dll, NULL, NULL, NULL, externalRoutines);
    R_useDynamicSymbols(dll, FALSE);
    R_forceSymbols(dll, TRUE);
}

and at the R level:

spec <- box:::parse_spec(quote(./example), "")
source_path <- normalizePath("./example.R")
info <- box:::mod_info(spec, source_path)
mod_ns <- box:::make_namespace(info)
mod_ns$.__NAMESPACE__. <- new.env(hash = TRUE, parent = baseenv())
mod_ns_name <- info$name
## or possibly this:
# mod_ns_name <- info$source_path
mod_ns$.__NAMESPACE__.$spec <- c(name = mod_ns_name, version = "0.1.0")
.External2(.C_regBoxNS, mod_ns_name, mod_ns)

but this is a horrible idea. do_regNS is attribute_hidden and in Rdll.hide meaning it is not meant to be used and cannot be used by R users. The only reason R_NamespaceRegistry is not also attribute_hidden or in Rdll.hide is to make accessing actual namespaces and variables from C easier. It should be regarded as read-only.

Even if you can justify registering your box namespaces in R_NamespaceRegistry, there comes the issue of the namespace name getNamespaceName(ns). Namespaces are expected to have a valid package name grepl(paste0("^", .standard_regexps()$valid_package_name, "$"), getNamespaceName(ns)), but filenames have no such requirement, meaning you might be registering namespaces with invalid names. Additionally, if you choose to use just the basename of the file as the name to be registered, you will run into issues of duplicates; box namespaces with the same namespace name that print identically, serialize identically, and cannot both be simultaneously registered. Each namespace is required to have a unique name. So you could use the normalized path, now duplicates are not an issue, but now the names most definitely are invalid.

On the topic of serialization, namespaces are serialized in a manner unlike normal environments. See these links:

This means that no matter which name you decided to give to the box namespaces, they will always be serialized and unserialized as real namespaces. When being restored, the box namespaces will become real namespaces that link to the wrong environment, or will become R_GlobalEnv because they cannot be restored, both of which are erroneous.

There is another issue in that box namespaces can import variables from a package and use them. However, given that these are not actual namespaces, they are not registered as using those namespaces, meaning a real namespace can be unloaded even though it is in use by a box namespace. This might not break any R code, but all pointers to C or Fortran functions in that package will be set to NULL so cannot be used, and any packages which expect the package to be loaded will see it's unloaded and try to reload it, giving you two R package namespaces at the same time.

As someone who has wasted tons of time trying to do the same thing myself, please don't waste your own time trying this.

klmr commented 8 months ago

Thanks for this detailed explanation! Namespace registration and serialisation are definitely good points which all but forbid doing this. — The naming ambiguity could be handled, but that would defeat the purpose of having a “nice” printable namespace name. In other words, there doesn’t seem to be a point to making ‘box’ module namespaces into package namespaces.