stainlessai / micronaut-jupyter

A Micronaut configuration that integrates your app with an existing Jupyter installation.
https://stainlessai.github.io/micronaut-jupyter
Apache License 2.0
9 stars 5 forks source link

Working directory is not properly set #42

Open dstieglitz opened 3 years ago

dstieglitz commented 3 years ago

When running the micronaut kernel, the Groovy scripts that are run with cell code are run from the user.home directory, not the directory of the current notebook. Any relative paths, e.g.,

new File("..")

will not be what the user expects.

This does not happen with the default Groovy kernel from beakerx, and it's not obvious why it doesn't work in the Micronaut kernel.

JoshuaCWebDeveloper commented 3 years ago

To reproduce:

./gradlew examples:basic-service:run

Then run this notebook: reproduce-issue-42.txt

JoshuaCWebDeveloper commented 3 years ago

This seems to be the expected functionality for Java: https://docs.oracle.com/javase/7/docs/api/java/io/File.html

By default the classes in the java.io package always resolve relative pathnames against the current user directory. This directory is named by the system property user.dir, and is typically the directory in which the Java virtual machine was invoked.

It isn't clear how BeakerX is managing to change the working directory, or why that isn't being applied to our kernel. A quick search of the beakerx code on github did not turn up any calls to System.setProperty("user.dir"), but further searching should be done.

Further info here: https://stackoverflow.com/questions/840190/changing-the-current-working-directory-in-java, setting that system property should work for new files, but would result in inconsistent relative path resolution across the JVM as a whole.

dstieglitz commented 3 years ago

I posted this question on https://github.com/twosigma/beakerx_kernel_groovy/issues/15

JoshuaCWebDeveloper commented 3 years ago

BeakerX is not setting the current working directory, it looks like that is set by Jupyter by running the kernel process in the directory of the notebook. Since we are using threads in a separate single process not managed by Jupyter, that is why we don't get the same behavior.

Given that this seems to be the expected behavior for Jupyter kernels (despite this person's objections: https://github.com/jupyter/notebook/issues/2194), this should indeed be considered a bug.

JoshuaCWebDeveloper commented 3 years ago

System properties are global to the JVM, so we can't simply wrap each thread with System.setProperty('user.dir'), since that would mess up concurrent threads.

Idea: Use a SecurityManager to intercept calls to System.getProperty('user.dir'). Either return the actual value of user.dir, or, if the call originates from a kernel, then return the correct working directory for that kernel.

Assumption: java.io classes use getProperty.

Question: How to associate kernels with directory? Idea: Pass the directory to the API endpoint from the kernel bash script.

Question: How to determine which thread (from which kernel) made a system call? Idea: Use ThreadLocal variable that holds the thread's directory in the security manager (demo: https://alphaloop.blogspot.com/2014/08/a-per-thread-java-security-manager.html). Keep in mind that there are kernel-level threads and cell-level threads (find out how ThreadLocal works with child threads and thread groups).

Performance considerations: The security manager will add overhead every single time getProperty is called. Research effect on performance.

Whatever solution that is implemented (either the above or something else) should be made configurable via a new configuration property. My gut says that the solution should be enabled by default (by default the working directory would be that of the notebook); however, if it introduces possible performance hits, we may want to consider disabling it by default.

@dstieglitz Thoughts on this idea?