vsoch / forward

Port Forwarding Utility
https://vsoch.github.io/lessons/sherlock-singularity/
MIT License
52 stars 27 forks source link

Trouble with Command Line Arguments to Launch Jupyter Notebooks into the Proper Directory #20

Closed royzawadzki closed 6 years ago

royzawadzki commented 6 years ago

From here the way to launch jupyter notebooks would on your local computer to do something like bash start.sh <software> <path>. I've been attempting to play around with the argument, but all my permutations lead me into a page with only one folder forward-util with three files for py3-juptyer.sbatch, py3-jupyter.sbatch.err, and py3-jupyter.sbatch.out.

My situation is that I have a directory on my local computer with two subdirectories: the cloned directory (forward) and another directory with my .ipynb files called BPA. I cd into forward to start up sherlock with the following commands and outcomes:

What is the proper syntax to get the files I want onto the jupyter notebooks page? Is it that the path is the path on the actual server?

vsoch commented 6 years ago

Heyo! I am running home from doctor but will be able to help you when I am back... so glad you are jumping in to using the tool and stay tuned! 🐶

royzawadzki commented 6 years ago

@vsoch No worries, someone's gotta ask all the dumb questions, right?

vsoch commented 6 years ago

These are not dumb questions at all! I'm back at my computer and testing this out now. First, a few comments:

First let's just make sure we are working from the same thing. Make sure your forward repository is up to date with the latest on Github, and that you have run setup.sh so that there is a CONTAINERSHARE and RESOURCE variable in your params.sh

USERNAME="vsochat"
PORT="43453"
PARTITION="russpold"
RESOURCE="sherlock"
MEM="20G"
TIME="8:00:00"
CONTAINERSHARE="/scratch/users/vsochat/share"

And you also should have run the hosts/ssh_sherlock.sh so that you have your ssh configuration in ~/.ssh/config like:

Host sherlock
    User vsochat
    Hostname sh-ln06.stanford.edu
    GSSAPIDelegateCredentials yes
    GSSAPIAuthentication yes
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p

And this would mean that if you type ssh sherlock you can issue a command after it! Eg:

ssh sherlock squeue -u vsochat

The first time you do that in a terminal, you will have to authenticate. The times after that you won't :)

Okay let's step back for a second and talk about your use case.

Use Case 1: Shared Reproducible Notebook

If you are creating an environment and notebooks that you want to move around, publish or otherwise share, then you would want to use a jupyter template and build your own container (already with the notebooks inside) and this is done just by copying repo2docker-julia and adding your notebooks, and building, and then running the command to point to your build, e.g.,:

bash start.sh sherlock/singularity-notebook docker://<username>/<repository>

And actually, you might still want to do this when your notebook is done and shiny and ready to submit to a paper, but I intuit from your post that you want more of a working environment, brought up on the fly, without much work in advance. So let's talk about this use case (and we can get back to the first when you are ready to publish!)

Use Case 2: Working (Not Reproducible) Notebook

This second use case is what I think you want, and it's only non reproducible because we aren't going to use a container (we will use modules on sherlock which may not always be there, might change, etc.) and the notebook files you also want to specify sort of "on the fly." The bug I see in what you are describing (and this is also a bug in my documentation not making it clear) is that the folder BPA would need to already be on the cluster somewhere (the path that you provide is relative to the cluster and not the local machine). BUT we can add a quick command to make this easy. Let's write up an example for how we would get this from the host.

Let's say I have a folder at /tmp/analysis with an analysis of interest! This already isn't reproducible because theoretically I've created this notebook with some jupyter notebook on my host that might have a mismatch in kernel with one on the cluster. Let's assume that it's the same. Here is the folder:

tree /tmp
   numpy_notebook.ipynb

And in this numpy notebook I have a Python 3 kernel that is pretty simple and useless, but will run something:

import numpy
stuffy = numpy.zeros((4,4))
print(stuffy)

Okay, so now I want to use this on sherlock, using forward. The first thing I want to do is copy the entire directory somewhere on the cluster, and I can use scp for that:

# Here is how we can make a directory to move our stuff to!
ssh sherlock mkdir -p /scratch/users/vsochat/my-analysis

# Now let's copy everything from the local folder there
scp /tmp/analysis/* vsochat@login.sherlock.stanford.edu:/scratch/users/vsochat/my-analysis
numpy_notebook.ipynb                                                                                                        100%  846     0.8KB/s   00:00    
v

If you want you can do another ssh sherlock command to check that it worked!

$ ssh sherlock ls /scratch/users/vsochat/my-analysis
numpy_notebook.ipynb

okay cool! Now we want to create the notebook there!

bash start.sh sherlock/py3-jupyter /scratch/users/vsochat/my-analysis

Here is the output. If you don't see exactly something like this, you probably have an older version, and should pull from master (I need to do tags / versions proper, developing pretty quickly and haven't yet!)

== Finding Script ==
Looking for sbatches/sherlock/sherlock/py3-jupyter.sbatch
Looking for sbatches/sherlock/py3-jupyter.sbatch
Script      sbatches/sherlock/py3-jupyter.sbatch

== Checking for previous notebook ==
No existing sherlock/py3-jupyter jobs found, continuing...

== Getting destination directory ==

== Uploading sbatch script ==
py3-jupyter.sbatch                                                                                                          100%  146     0.1KB/s   00:00    

== Submitting sbatch ==
sbatch --job-name=sherlock/py3-jupyter --partition=russpold --output=/home/users/vsochat/forward-util/py3-jupyter.sbatch.out --error=/home/users/vsochat/forward-util/py3-jupyter.sbatch.err --mem=20G --time=8:00:00 /home/users/vsochat/forward-util/py3-jupyter.sbatch 43453 "/scratch/users/vsochat/my-analysis"
Submitted batch job 23423516

== View logs in separate terminal ==
ssh sherlock cat /home/users/vsochat/forward-util/py3-jupyter.sbatch.out
ssh sherlock cat /home/users/vsochat/forward-util/py3-jupyter.sbatch.err

== Waiting for job to start, using exponential backoff ==
Attempt 0: not ready yet... retrying in 1..
Attempt 1: not ready yet... retrying in 2..
Attempt 2: resources allocated to sh-01-31!..
sh-01-31
sh-01-31
notebook running on sh-01-31

== Setting up port forwarding ==
ssh -L 43453:localhost:43453 sherlock ssh -L 43453:localhost:43453 -N sh-01-31 &
== Connecting to notebook ==

== View logs in separate terminal ==
ssh sherlock cat /home/users/vsochat/forward-util/py3-jupyter.sbatch.out
ssh sherlock cat /home/users/vsochat/forward-util/py3-jupyter.sbatch.err

== Instructions ==
1. Password, output, and error printed to this terminal? Look at logs (see instruction above)
2. Browser: http://sh-02-21.int:43453/ -> http://localhost:43453/...
3. To end session: bash end.sh sherlock/py3-jupyter

Now since this isn't a container, the password is the one that I've set up in advance for jupyter notebook (loading the same module on sherlock, and setting the password, let me know if you haven't done this and need the instruction again, I believe it's in the README)

$ ssh sherlock cat /home/users/vsochat/forward-util/py3-jupyter.sbatch.err
[I 15:20:15.124 NotebookApp] Writing notebook server cookie secret to /tmp/jupyter/notebook_cookie_secret
[I 15:20:29.269 NotebookApp] Serving notebooks from local directory: /scratch/users/vsochat/my-analysis
[I 15:20:29.270 NotebookApp] 0 active kernels 
[I 15:20:29.270 NotebookApp] The Jupyter Notebook is running at: http://localhost:43453/
[I 15:20:29.270 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Then when I open the browser (and enter my password) I get the web interface, and there is my little notebook <3

image

That should be the complete instructions to get the functionality that you need, and when your notebook is done you would want to make a container (and not have the potential to have errors with versioning, etc.

Let me know if you have more questions!

royzawadzki commented 6 years ago

@vsoch thank you for the detailed response! I wasn't sure if it was on your local machine or not, but it seems like an scp/sftp pipeline is the way to go for using notebooks stored locally. On an unrelated note, do you ever plan to roll out sherlock functionality with Jupyter Labs?

vsoch commented 6 years ago

You mean beyond just the Jupyter container notebook? (e.g., from the sherlock/containershare-notebook script, using repo2docker-jupyter).

Ohh I see! This guy! --> https://github.com/jupyterlab Oh yes, this would be amazing! Let me look into this, I'll open an issue for further notes.

Also in case you didn't see, our little discussion here is now a "tiny tutorial!" --> https://gist.github.com/vsoch/f2034e2ff768de7eb14d42fef92cc43e meaning he is adorable and forever preserved to be. Thank you!