reactorlabs / rir

GNU General Public License v2.0
62 stars 18 forks source link

Functions precompiled to GnuR bytecode do not get compiled to RIR bytecode when called from the toplevel #1046

Open vogr opened 3 years ago

vogr commented 3 years ago

It seems that Ř does not compile functions defined in packages when these functions are called from the toplevel, or at the toplevel in an eval() call:

library(yaml)

# as.yaml not compiled
as.yaml(10)
as.yaml(10)
as.yaml(10)

# as.yaml not compiled
q <- quote(as.yaml(10))
eval(q)
eval(q)
eval(q)

# as.yaml will get compiled
f <- function() {
   as.yaml(10)
}

f()
f()
f()

(after installing yaml from R or from Ř with install.packages("yaml"))

This issus comes from the fact that toplevel function calls are handled in the function SEXP eval(SEXP,SEXP) in eval.c (in custom-r), and that if the functions has a BCODESXP body (GnuR bytecode), then the evaluation will only use GnuR machinery (bcEval also from eval.c) and will never try to compile the function to RIR bytecode. This is a only problem for functions defined in library: when a library gets installed, by default its functions get precompiled to GnuR bytecode.

> install.packages("yaml")
> yaml::as.yaml
<bytecode: 0x55e65a9c1060>

The precompilation can be disabled using the environment variable R_COMPILE_PKGS (or with compiler::compilePKGS(FALSE)):

$ export R_COMPILE_PKGS=0
$ ./R
> remove.packages("yaml")
> install.packages("yaml")
> yaml::as.yaml
# no bytecode

Standard library packages also get precompiled:

> Matrix::bdiag
<bytecode: 0x5576b68d5760>

This can be prevented by building GnuR with export R_COMPILE_PKGS=0.

But this only partially fixes the issue! Because the logic used in R_CheckJIT to determine if a function is defined at the toplevel is to check CLOENV(fun) == R_GlobalEnv, which is false for function defined in libraries (it probably assumes that library functions are all precompiled and that this test is only run on user-defined functions):

> environment(yaml::as.yaml)
<environment: namespace:yaml>

Toplevel functions get compiled if they are called several times; non-toplevel functions only get compiled if they get a big score from JIT_score. With GnuR-precompilation disabled, library functions are all considered as non-toplevel and get compiled (to RIR) less than I think they should.

vogr commented 3 years ago

What I tried:

vogr commented 3 years ago

Now that I have read the code a bit more, I think it would be sufficient to modify R_CheckJIT only: currently it gives a score of 1 to every BCODESXP-compiled functions. We could make it compute the score on the AST

if (externalCodeToExpr && (TYPEOF(body) == BCODESXP)) {
    body = VECTOR_ELT(CDR(body), 0);
}

and check for toplevel differently: CLOENV is either R_GlobalEnv or a package namespace

int const is_toplevel = (CLOENV(fun) == R_GlobalEnv) || R_IsNamespaceEnv(CLOENV(fun));

(or purely remove the toplevel distinction, though I don't understand why it was ever needed)

If the compilation happens, the BCODESXP gets replaced by and EXTERNALSXP. Else it will get evaluated by the GnuR bytecode interpreter which might be a bit faster than falling back to interpreting the AST as done in the 2 approaches described in the previous comment.

o- commented 3 years ago

thanks. that plan of action sounds good to me. eager to see the effects.