jupyter / jupyter_client

Jupyter protocol client APIs
https://jupyter-client.readthedocs.io
BSD 3-Clause "New" or "Revised" License
390 stars 283 forks source link

Reference kernelspec resources from kernel? #445

Open devinrsmith opened 5 years ago

devinrsmith commented 5 years ago

I've got a custom kernel that has a lot of custom configuration. I can pass through the arguments one-by-one in the kernelspec argv such like:

{
  "argv": [
    "python",
    "-m",
    "my_custom_kernel",
    "{connection_file}",
    "arg1",
    "arg2",
    ...,
    "argN"
  ],
  "display_name": "My Custom Kernel",
  "language": "python"
}

but it's a bit sloppy, and error prone with complex argument data types. It's much easier for me to lay down a file in the same directory as kernel.json - let's say, my_custom_kernel.config, and create a kernel.json that might look something like this:

{
  "argv": [
    "python",
    "-m",
    "my_custom_kernel",
    "{connection_file}",
    "my_custom_kernel.config"
  ],
  "display_name": "My Custom Kernel",
  "language": "python"
}

Unfortunately, I don't see any easy way for the executed kernel to know where it should look for my_custom_kernel.config at - there doesn't seem to be any environment variables set that would let the kernel know where it came from.

I also can't set the full hardcoded path at kernel install time because I'm delegating that responsibility to KernelSpecManager().install_kernel_spec(...), and kernel.json needs to be setup before that.

Is there any way to easy way to figure out what kernelspec context was used from the context of the running kernel? If not, is it something that can be added?

kevin-bates commented 5 years ago

I'm not aware of a solution to this at this time short of what you've done. There's current momentum, as part of the jupyter_server effort, to introduce parameterized kernels. There have been some discussions about this, in this repo (#434) and in juptyer_kernel_mgmt, which is a proposal for a different way of dealing with kernelspecs.

However, the value you want is the resource_dir from the kernelspec and, IMHO, it would be useful for the kernel to access its resource dir (although it probably shouldn't assume that directory is accessible since it won't be for things like YARN/Spark clusters and other flavors of remote kernels). You might try exploring this area of jupyter_client. I believe you could easily add a env["JPY_KERNEL_RESOURCE_DIR"] = self.kernel_spec.resource_dir statement prior to _launch_kernel().

There is precedent for setting system-owned environment variables prior to launch in https://github.com/jupyter/jupyter_client/blob/master/jupyter_client/launcher.py.

Btw, one of the nice things regarding the Kernel Provider proposal in juptyer_kernel_mgmt is that you could write your own "provider" which implements the relationship between the kernelspec and launch however you want. That said, I really like the idea of conveying the resource directory to the kernel as a "factory" value.

Some of the grizzled veterans of Jupyter may have some additional tips and tricks as well.

devinrsmith commented 5 years ago

Thanks for the quick response. A proper solution for runtime parameterized kernels would solve a lot of my issues, and is the reason I'm manually parameterizing at install time. That said - I think there's still a case to be made that a proper workflow around install parameterized kernels is an important and subtly different problem. And if that can be as simple as providing an environment variable to the resource directory, that would be great. If it needs to be more integrated to support YARN/Spark/remote kernels, then other solutions might need to be architected (I can imagine the framework doing an rsync-style replication of the relevant resource directory to the remote host; or just simply the single config file specified at install time, my_custom_kernel.config, whose full path could be substituted using {install_config_file} in the kernelspec file).

devinrsmith commented 5 years ago

I'm currently working around this by pulling out the directory from the kernel spec manager, and using it directly in my kernel.json file:

    kernel_spec_manager = KernelSpecManager()
    destination = kernel_spec_manager._get_destination_dir(kernel_name=kernel_name, user=user, prefix=prefix)
    # ... create kernel.json with reference to destination
    kernel_spec_manager.install_kernel_spec(path, kernel_name=kernel_name, user=user, prefix=prefix)