s-u / rJava

R to Java interface
https://RForge.net/rJava
233 stars 77 forks source link

rJava "locks" on Java paths on load #334

Open e-kotov opened 4 days ago

e-kotov commented 4 days ago

Hello @s-u ,

as you might remember, I made this companion package for R users to effortlessly install Java that matches their platform/cpu/etc and sets the JAVA environment variables in their current project/working dir and/or session ( https://github.com/e-kotov/rJavaEnv ).

One edge case I am struggling with, is when the user already has "touched" {rJava} in ANY way. That is, if the user has only ran library(rJava), this triggers the .onLoad(), after which point, even if the users has not touched .jinit(), it is (seemingly) impossible to force {rJava} to respect the JAVA_HOME that the user might be setting afterwards manually or with my package.

Even worse, though this is more of a side effect of R's and some IDE's autocomplete, as soon as the user types rJava:: in R console and presses tab, or if they have some R code in the R script that has direct references like rJava::.jinit or any other function, that also triggers loading of {rJava} namespace and apparently the .onLoad(), and therefore also locks the paths where {rJava} will look for Java...

This is a minor annoyance, but still an annoyance for the user. They have to restart the R session to be able to use any new JAVA_HOME path that they would want to apply after doing any even unintentional interaction with {rJava}. I see room for improvement here.

Would you be able to not run parts of the .onLoad() sequence that causes this "lock" on folders before the user actually does something more meaningful than just loading {rJava} (intentionally or not)? I see a common use case where users list the libraries in the beginning of their R script. Hence, if they first list all packages they want to load, including {rJava}, they essentially cannot change the JAVA_HOME anymore, as it will be ignored as soon as {rJava} is loaded.

Alternatively, would it be possible to recheck the up-to-date JAVA_HOME environment variable (and any other relevant env vars) on the first run of .jinit() or any other function in {rJava} that actually triggers the initialisation of JVM?

This is related to the following issues: https://github.com/s-u/rJava/issues/249 - essentially, the problem would be solved, if one could re-load {rJava} after detaching, but we know that does not work. https://github.com/s-u/rJava/issues/25 - somewhat related. What I am asking about here, is not re-loading the Java after initialisation, but rather to be able to change the Java path via JAVA_HOME before initialisation, which seems like a valid thing to be able do.

s-u commented 3 days ago

By "touched" you mean loaded!

The way packages work is to load their native code when the package is loaded (by design) so at that point it resolves the jvm dependency. It really by design that the environment is expected to be in place when you load the package. The initialization just provides additional arguments to the JVM, but does not change which JRE is loaded - since the package is already loaded by that point - including the corresponding jvm library.

I was thinking about a re-write which would not link to jvm at, but instead dynamically load just the JNI symbols at later at run time. In that case it would be possible to defer the loading of the run-time. However, that would have to be tested first, because of the indirect dependencies which are nightmare when loading the jvm by hand.

e-kotov commented 2 days ago

By "touched" you mean loaded!

Yes, I do mean loaded. But I also would like to stress this as "touched", because as I have demonstrated, the {rJava} (or any package namespace in the same scenarios, for that matter) is sometimes loaded without the user even knowing about it. I imagine most users don't know that a package can do something (e.g. execute virtually anything with .onLoad()) without them loading it explicitly with library(). To be honest, I did not know this for 10 years until last week, because this is not intuitive (at least to me). {rJava} here is not to blame, as this is R's behaviour in general. But the side-effect we get is that once this non-evident loading has happened, there is no way back unless we reload the whole R session.

It would be great if you could defer the loading of JNI symbols to a later stage. I think this will improve the experience of users of {rJava}-dependent R packages.

s-u commented 23 hours ago

I have added some rudimentary support in the feature/dynamic branch. This is only tested on macOS and Linux, unlikely to work on Windows at this point. But the nice thing is you can do (on macOS):

> library(rJava)
> Sys.setenv(JAVA_VERSION=1.8)
> J("java.lang.System")$getProperty("java.version")
Loading JVM from /Library/Java/JavaVirtualMachines/temurin-8.jdk/Contents/Home
[1] "1.8.0_422"

or

> library(rJava)
> Sys.setenv(JAVA_VERSION=17)
> J("java.lang.System")$getProperty("java.version")
Loading JVM from /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home
[1] "17.0.12"

(thus proving that the actual JVM loading is done just before use).

e-kotov commented 9 hours ago

@s-u

Apparently, {rJava} is not that easy to install from source) install_github() failed for me (well, you know the reason, of course). But that's not the point.

I managed to install the feature/dynamic branch of {rJava} from source in a Linux container (Rocker Geospatial 4.4.0) and tested it.

In my opinion, it now works perfectly. Just what I wanted!

Now I can even use rJava::.jniInitialized or rJava::.jvmState() to test if JVM is initialised and provide more informative warning to the users of {rJavaEnv}, telling them why exactly they cannot successfully set a new JAVA_HOME (in case JVM is initialised), instead of just telling them that because {rJava} package namespace is already loaded they should restart R.

I'm not sure I can test this on macOS or Windows, as this requires compilation, and on my Apple Silicon mac I could not figure out how to build {rJava} from source. But I would guess if the feature/dynamic {rJava} works on macOS, and if you find time to make it work on Windows (if this is possible at all), then what I am doing in {rJavaEnv} will also play well with {rJava} on these platforms.

s-u commented 4 hours ago

You can't use install_github as it doesn't know how to create a package from the repository (which is not the package itself as it needs to compile the Java code). See the README - simply run sh mkdist in the repository to create the package (if you're on macOS you probably want install JDK 1.8 (e.g., temurin-8) and use JAVA_VERSION=1.8 sh mkdist since rJava aims to target 1.6 as to minimize the JDK version requirements -- or use JDK 11 if you can't get 1.8).

Windows doesn't work right now (see https://github.com/s-u/rJava/actions) because I did not update the sources to build without linking JVM and I need to add the Java location detection from the build phase into the run-time - which will take some work.