jupyterhub / mybinder.org-user-guide

Turn a Git repo into a collection of interactive notebooks. This is Binder's user documentation repository.
https://mybinder.readthedocs.io
BSD 3-Clause "New" or "Revised" License
159 stars 103 forks source link

Modifying kernelspec used at launch #152

Closed agitter closed 5 years ago

agitter commented 5 years ago

I'm working on setting up a Jupyter notebook that uses the R kernel (IRkernel) inside a conda environment named contrib-viz. When I launch the binder, the kernel is not found, as seen in the following screenshot kernel_error

The problem seems to be that the local notebook is saved with a kernlespec that includes the name of the conda environment:

"kernelspec": { 
   "display_name": "R [conda env:contrib-viz]", 
   "language": "R", 
   "name": "conda-env-contrib-viz-r" 
  }

Jupyter launched through binder is trying to use a different kernelspec:

"kernelspec": {
   "display_name": "R [conda env:conda]",
   "language": "R",
   "name": "conda-env-conda-r"
  }

Selecting the kernel R [conda env:conda] that is offered in the drop down menu in the screenshot is a workaround, but I expect that will deter novice users who would most benefit from the binder. I can also edit the .ipynb file to make the kernelspec match what binder expects, but then I get the kernel not found error when I run the notebook in my local environment. Renaming my conda environment would probably work, but I have multiple environments for this project and prefer to keep them distinct if possible. Is there a better general solution?

The binder link is https://mybinder.org/v2/gh/agitter/meta-review/binder?filepath=analyses/deep-review-contrib/02.contrib-viz.ipynb This is a temporary branch for a pull request that will be deleted soon. Currently, this is the version with an edited .ipynb file that works in binder but not locally.

Thanks for the binder service. I see a lot of opportunities to use this.

betatim commented 5 years ago

I think the "weird" kernel names stem from using nb_conda_kernels. Which uses these different names to generate a kernel per environment. I am not very familiar with what it does but I'd try and see what happens when you don't install it on the binder image.

Another option is to add a step to your postBuild that cleans up the kernel names in your notebook files or modifies the kernel spec.

I don't know if there is a way to tell Jupyter to ignore/overwrite the name of the kernel specified in the notebook in a "Trust me, just use X as a kernel" way.

agitter commented 5 years ago

Thank you @betatim. I'm going to test using postBuild for this and will follow up with more questions or to close the issue.

betatim commented 5 years ago

Another thing to try might be conda create --name new_name --clone old_name which creates a clone of an environment. Might be the least fiddly to get working, but I've never used it myself.

agitter commented 5 years ago

Thanks, that's a good idea as well. Is old_name in this case root? I saw in the build log

Step 40/47 : RUN conda env update -n root -f "binder/environment.yml" && conda clean -tipsy && conda list -n root
betatim commented 5 years ago

Correct.

agitter commented 5 years ago

This is working now so I'll leave some final notes for anyone who has a similar problem. I decided that modifying the .ipynb notebook file in the postBuild script was the best way to accomplish what I wanted. I can still use nb_conda_kernels in my local environment. The modified notebook has an edited kernelspec to match a kernel that is available in Binder.

I was unable to get conda create --name new_name --clone old_name working. I received the error

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

even after briefly attempting to enable conda in postBuild. Because conda clean -tipsy has already run before the conda create in the postBuild script, creating the environment is slow anyway. conda has to re-download all of the packages again.

Part of my initial confusion was already related to this issue https://github.com/jupyter/repo2docker/issues/411 I had assumed that using Binder with an environment.yml file would work similarly to

conda env create --file environment.yml
conda activate <new_environment>

After looking through the repo2docker conda buildpack code I realized that the environment in environment.yml is not created or activated. It is used to update the base conda environment. That could be documented in https://mybinder.readthedocs.io/en/latest/config_files.html#environment-yml and I'm happy to suggest something in another issue.

betatim commented 5 years ago

A PR to add your suggestion to the documentation would be great. I agree it is a bit weird that we aren't running conda env create... and instead updating an existing environment.

If you have a link to or the snippet you use to modify your notebooks we should link to it/post it here so that people from the future can find it.

Thanks for sticking with this and debugging+figuring it out.

dhimmel commented 5 years ago

A PR to add your suggestion to the documentation would be great. I agree it is a bit weird that we aren't running conda env create... and instead updating an existing environment.

Agreed, I think this will continue to cause frustration to unsuspecting users as it is not very intuitive. It also interferes with the reproducibility aspect of binder, because it means that users who install the environment locally will be interacting with a different environment than what is used by binder.

agitter commented 5 years ago

As long as it is documented, I understand the motivation for Binder's approach. The conda environment we (that is, @dhimmel) created was a self-sufficient environment that contained all the dependencies needed to run the notebooks locally. Binder instead assumes that most users don't want to or know how to set that up. The environment file is being used as a list of conda packages to install, not a definition of a new conda environment. That overloaded usage of an environment.yml file will be confusing if it isn't described.

I do agree that this setup makes it challenging to run notebooks in the exact same environment locally and remotely through Binder. Because a conda environment file alone isn't enough to guarantee reproducibility (as we learned through our tornado versioning problems), I find it difficult to come up with an alternative that would be better and remain accessible to the average user.

I'll follow up with links and a documentation pull request.

betatim commented 5 years ago

The environment.yml is used to update the environment so even for packages that are installed by default the version specified in the repositories environment.yml should "win". Otherwise please report a bug in https://github.com/jupyter/repo2docker.

The best we have to offer right now for running things locally is http://repo2docker.readthedocs.io/ which is the tool that Binder uses to build your environment. You can install it with pip install jupyter-repo2docker and run repo2docker https://github.com/myorg/myrepo to get the same experience as on mybinder.org (with some patience to build the image). For actual local development repo2docker --editable some/local/dir is more useful as it will mount your local directory read-write into the container so you can keep using your favourite editor and tools. Polishing the run-locally-for-day-to-day-dev experience is something we are working on and welcome contributions. IN particular in making things faster.

agitter commented 5 years ago

The postBuild lines I addeed in https://github.com/greenelab/meta-review/pull/161/commits/a953111366199bd6a87716264635300cc0b0043b to modify the notebook were:

sed -i 's/R \[conda env:contrib-viz\]/R \[conda env:conda\]/g' analyses/deep-review-contrib/02.contrib-viz.ipynb
sed -i 's/conda-env-contrib-viz-r/conda-env-conda-r/g' analyses/deep-review-contrib/02.contrib-viz.ipynb

As a side note, it was tricky to debug this because the build log is transient. Installing repo2docker locally seems like the best way to debug now. Would it be possible to keep a copy of the build log at a known location inside the container so that it can be inspected through the running notebook?

betatim commented 5 years ago

https://github.com/jupyterhub/binderhub/issues/155 is the issue to watch and contribute to to make "build logs in the built image" a reality.

I prefer using repo2docker locally to repeated builds on a BinderHub because I get much faster turn around times and more inspectability during debugging.

agitter commented 5 years ago

Where is the source file for https://mybinder.readthedocs.io/en/latest/config_files.html#environment-yml? The edit this page link goes to https://github.com/jupyterhub/binder/edit/master/doc/config_files.rst which is broken, and doc/config_files.rst is in .gitignore.

It looks like I actually need a pull request to edit https://github.com/jupyter/repo2docker/blob/master/docs/source/config_files.rst, right?

betatim commented 5 years ago

You managed to find it.

https://repo2docker.readthedocs.io/en/latest/config_files.html#environment-yml-install-a-python-environment is the source of https://mybinder.readthedocs.io/en/latest/config_files.html#environment-yml (don't ask). And the source for the former is https://github.com/jupyter/repo2docker/blob/master/docs/source/config_files.rst edits to which by PR please.

(Sorting through and restructuring our documentation is on the todo list for our week long meeting next week.)