jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.56k stars 801 forks source link

Running RStudio via z2jh-based JupyterHub #990

Closed ablekh closed 4 years ago

ablekh commented 6 years ago

I'm trying to create a separate JupyterHub cluster for an upcoming workshop that requires using RStudio sessions rather than using R kernel in JupyterHub. Since a proper multi-user RStudio setup can currently only be implemented via commercial version of RStudio Server and because I want to take advantage from JupyterHub's convenient authentication mechanisms (and container-based session isolation), I was working hard to setup a z2jh-based JupyterHub cluster similarly to how Binder enables such setup.

It was my understanding that I could just specify desired version of RStudio-based Docker container image in relevant config.yaml without any other changes. I have done just that, selecting the Rocker distribution, specifically the rocker/verse image (to support LaTeX etc.). However, when, after fixing secondary issues, I tried to spawn a single-user container, I was greeted with the following error message on the progress page:

2018-10-23 10:44:04+00:00 [Warning] Error: failed to start container "notebook": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"jupyterhub-singleuser\": executable file not found in $PATH"

After seeing this, I started thinking that, perhaps, the standard Rocker Docker images (from DockerHub) are not JupyterHub-compatible. If I'm correct on this, then I think that I could build my own relevant image from the source (https://github.com/rocker-org/rocker-versioned/tree/master/verse, I assume; though not clear whether I need to into a specific version-dir), using jupyter-repo2docker as suggested by @cboettig here. Any help and/or advice will be much appreciated.

minrk commented 6 years ago

You might be interested in nbrsessionproxy, a package that adds RStudio sessions to Jupyter (and thereby JupyterHub).

ablekh commented 6 years ago

@minrk Thank you for your prompt feedback and for reminding me about nbrsessionproxy. I meant to use it, but somehow completely forgot about it (I guess, I became too excited about a possibility of using a pre-built Rocker image without installing additional dependencies :-). I assume that, upon selecting the RStudio Session in the menu, user will face a standard RStudio UI, similarly as Binder presents it.

If I understand correctly, it appears that, in order to use nbrsessionproxy, essentially I would have to build a custom Docker image on top of one of the standard (or our custom) Jupyter images. Correct?

I'm still quite curious about my original approach and the above-mentioned error. Do you have thoughts about what was going on and advice on how to fix it. Would my idea on using jupyter-repo2docker work?

consideRatio commented 6 years ago

@ablekh I know it is a bit complicated but possible to get rstudio working alongside jupyterlab and various kernels.

I'm interested in finding the minimal steps needed to do this. I would not recommend using jupyter-repo2docker in order to get a z2jh image, but instead attempt do base a new image from the jupyter/docker-stack repo's images. Speaking of which, perhaps one of them already has RStudio ready for use?


UPDATE: the datascience-notebook, built on top of r-notebook, did not have RStudio installed. QUESTION: what is needed on top of a datascience-notebook, in order to be able to access the /rstudio endpoint alongside the /lab endpoint?

ablekh commented 6 years ago

@consideRatio Thank you for your continued support. :-)

The goal for this effort is to have a full-featured (as Binder does) multi-user RStudio environment, using all benefits of a K8s-based containerized JupyterHub setup. Unfortunately, jupyter/docker-stacks project does not offer RStudio-enabled images, hence my attempt to use the Rocker ones. I have some experience with creating custom JupyterHub images, based on docker-stacks, but I have always used jupyter-repo2docker to build those images. I'm curious about why you're recommending against it. What are the problems with this tool? If not using it, how would you go about building a custom image, based on docker-stacks?

consideRatio commented 6 years ago

@ablekh oh thank you for investigating and sharing all kinds of knowledge and experience!

It is my understanding that repo2docker is meant to build for a specific repo, and that a specific repo is meant to run under one rather than multiple different kernels, which a typical z2jh image may want to have.

Does repo2docker utilize the docker-stacks images as a base image? Is that how you are utilizing them (indirectly) while building using repo2docker as a tool?

Beware that I'm only using my shallow knowledge to recommend against using repo2docker specifically for creating images for use with a z2jh deployment if you want to make a non-repo specific image for general use.


A friend of mine has built a huge docker image with RStudio enabled on top of a docker-stacks base image, so it is possible. I have not yet learned how to do it, but I recall that nbrsessionproxy was essential as @minrk suggested. Will get back to you if I set this up, it is something I'd like to do but doesn't have a high priority in comparison to other tasks atm.

ablekh commented 6 years ago

@consideRatio My pleasure. Thank you for sharing your knowledge and experience as well. I will very much appreciate any further help/advice from you and others in this nice community. In the meantime, I will try to use nbrsessionproxy (after getting some sleep - was up all night mostly working) and will share findings.

Re: my experience using jupyter-repo2docker - I have used this tool to successfully build a custom image, adding Octave kernel on top of the datascience-notebook image source from the jupyter-docker-stacks project, in order to simultaneously support Python, R, Julia and Octave.

minrk commented 6 years ago

QUESTION: what is needed on top of a datascience-notebook, in order to be able to access the /rstudio endpoint alongside the /lab endpoint?

This dockerfile will install rstudio and install & enable nbrsessionproxy:

FROM jupyter/r-notebook # or datascience-notebook
# install nbrsessionproxy extension
RUN conda install -yq -c conda-forge nbrsessionproxy && \
    conda clean -tipsy

# install rstudio-server
USER root
RUN apt-get update && \
    curl --silent -L --fail https://download2.rstudio.org/rstudio-server-1.1.419-amd64.deb > /tmp/rstudio.deb && \
    echo '24cd11f0405d8372b4168fc9956e0386 /tmp/rstudio.deb' | md5sum -c - && \
    apt-get install -y /tmp/rstudio.deb && \
    rm /tmp/rstudio.deb && \
    apt-get clean
ENV PATH=$PATH:/usr/lib/rstudio-server/bin
USER $NB_USER

If you already have an image with rstudio server, then just installing nbrsessionproxy should be enough, following the installation instructions, either with pip or conda.

consideRatio commented 6 years ago

Wow thanks @minrk !!

ablekh commented 6 years ago

@minrk Thank you very much for your advice. Will definitely try this approach. I still hope to hear your / others' opinion on my original (Rocker) approach, which IMO should work (perhaps, with some changes).

BTW, how do you recommend to build the image (based on the Dockerfile you shared above): using jupyter-repo2docker or some other method (I guess, by simply using docker command)? As you can see from @consideRatio's and my recent comments here, we have quite different experiences in this regard ...

ablekh commented 6 years ago

UPDATE: Hey, folks! Just wanted to let everyone know that, based on @minrk's advice (thanks again!; the nbrsessionproxy approach), I was able to successfully build relevant image (using jupyter-repo2docker) as well as configure and run RStudio on our separate AKS-based JupyterHub cluster (actually, it was yesterday afternoon - sorry about delaying the update). There were some arguably AKS-specific issues (which drove me slightly crazy :-), but they were either fixed by me or went away over time.

For the sake of completeness, I want to say that, while this nbrsessionproxy-based approach is nice, it is well-suited for multi-kernel JupyterHub implementations (which, if I can guess, most likely represent more than 95% of all JH deployments). For the rest of deployments that have special requirements, such as, in this case, my preference for RStudio-only deployment (a la relevant Binder's example deployment), I suspect that my original approach (or, likely, its modification) would be much more appropriate. For this particular case, perhaps, there is a compromise-based solution (e.g., via JH configuration) that would allow an immediate redirect in-place (in the same tab) after authentication to user's RStudio session endpoint, without opening the standard JH UI and having to select the RStudio Session menu item to open RStudio UI in a new tab.

As I've said, I'm still curious about my original (Rocker image-based) approach [see my initial comment in this thread], so if someone would like to share their thoughts on this, I would be delighted to hear them.

ryanlovett commented 6 years ago

@ablekh I believe you can configure JupyterHub with c.Spawner.args = ['--NotebookApp.default_url=/rstudio'] to get the behavior you want.

ablekh commented 6 years ago

@ryanlovett Thank you so much for this advice. I was pretty sure that it is possible, but wasn't sure what configuration option is responsible for that behavior (opening custom endpoint instead of default one). Using this opportunity, I'd like to thank you for creating nbrsessionproxy as well as your help in general.

P.S. BTW, do you know by any chance why username in RStudio sessions remain default jovyan instead of expected one, based on GitHub authentication? When building RStudio-enabled custom image, in the end I noticed a warning about NBUSER or such not being used (wording was different, but not essence). I suspect that this what causes this minor issue. Also, what do you think about my original Rocker-based approach?

ablekh commented 6 years ago

Which of the following methods for implementing @ryanlovett's advice (see above) is correct (or better):

# method 1
singleuser:
  defaultUrl: "/rstudio"

# method 2
hub:
  extraConfig: |-
    c.KubeSpawner.singleuser_image_pull_secrets = "<SECRET_NAME>"
    c.Spawner.args = ['--NotebookApp.default_url=/rstudio']
consideRatio commented 6 years ago

Not looked into the details, but I would go with the defaultUrl value.

@ablekh PS: the singleuser_ prefix is deprecated, writing c.KubeSpawner.image_pull_secrets = "<SECRET_NAME>" is the current recommended practice.

ablekh commented 6 years ago

@consideRatio Thanks much for your advice on both aspects. If you still need any help with creating custom RStudio-focused Docker images, please let me know and I will do my best to help ...

ablekh commented 6 years ago

Just updated my RStudio-focused deployment's config.yaml and upgraded the cluster. The defaultUrl option worked, directly opening RStudio UI upon authentication, however the following error message gets produced in RStudio's console window. Any thoughts?

24 Oct 2018 08:19:26 [rsession-jovyan] ERROR session hadabend; LOGGED FROM: rstudio::core::Error {anonymous}::rInit(const rstudio::r::session::RInitInfo&) /home/ubuntu/rstudio/src/cpp/session/SessionMain.cpp:563
ablekh commented 6 years ago

Note to self: in non-default UI environments like this, JupyterHub's Control Panel still can be accessed at /hub/home endpoint (for administrative stuff, if enabled, see Admin menu item (/home/admin endpoint).

manics commented 6 years ago

BTW, do you know by any chance why username in RStudio sessions remain default jovyan instead of expected one, based on GitHub authentication?

The default docker-stack images can switch the username to $NB_USER (and also $NB_UID $NB_GID if you want), but you need to run the image as root, e.g. see start.sh: https://github.com/jupyter/docker-stacks/blob/f2889d7ae7d6a4a404169b985f2f2ca421f388a1/base-notebook/start.sh#L47 You could try something similar in your image?

ablekh commented 6 years ago

@manics Thank you very much for this advice. However, I'm not sure what level / step of the deployment workflow do you mean, when talking about running an image as root. Could you clarify this for me?

manics commented 6 years ago

Just dug out my config and it's a bit more complicated than I thought. Setting singleuser.uid: 0 to start the singleuser server as root is the easy bit, but you need to pass extra information (GitHub username) from the spawner to jupyter. I've got a test system working with LDAP (with @consideRatio's help):

hub:
  extraConfig: |
    ...
    Lots of extra stuff
    ....
    class LDAPAuthenticatorInfoUID(LDAPAuthenticatorInfo):
        @gen.coroutine
        def pre_spawn_start(self, user, spawner):
            auth_state = yield user.get_auth_state()
            self.log.error('pre_spawn_start auth_state:%s' % auth_state)
            if not auth_state:
                return

            # setup environment
            spawner.environment['NB_UID'] = str(
                auth_state['uidNumber'][0])
            spawner.environment['NB_USER'] = auth_state['uid'][0]

This required a modified version of LDAPAuthenticator to fetch the extra info: https://github.com/jupyterhub/ldapauthenticator/pull/103 but it looks like the GitHub authneticator already includes the required fields: https://github.com/jupyterhub/oauthenticator/blob/0.8.0/oauthenticator/github.py#L162

ablekh commented 6 years ago

@manics I see. Hmm, interesting ... I appreciate your help. However, I'm a bit confused by your example - should it be reworked into something like a subclass of class GitHubOAuthenticator(OAuthenticator) or there is a way to simply (without lots of code) pass already captured GitHub name value via extraConfig?

manics commented 6 years ago

Yes, you effectively create the authenticator subclass in extraConfig instead of building a custom image. Since GitHubOAuthenticator already passes the github username it should be fairly easy, something like this might work (I haven't tried it):

singleuser:
  uid: 0

hub:
  extraConfig: |
    class CustomGitHubOAuthenticator(GitHubOAuthenticator):
        @gen.coroutine
        def pre_spawn_start(self, user, spawner):
            auth_state = yield user.get_auth_state()
            self.log.info('pre_spawn_start auth_state:%s' % auth_state)
            if not auth_state:
                return

            # setup environment
            spawner.environment['NB_USER'] = auth_state['github_user']

    c.JupyterHub.authenticator_class = LDAPAuthenticatorInfoUID

auth:
  state:
    enabled: True
    cryptoKey: SECRET-KEY

PS ping me in Gitter if you want

ryanlovett commented 6 years ago

@ablekh No problem, I'm happy the extension has been useful for others. :)

If you need the container to have NB_USER set to be the same as what your authenticator provides, @manics solution looks like the right approach to me. We do the same when we need to slightly alter hub behavior.

Also, what do you think about my original Rocker-based approach?

I'm not too familiar with the Rocker images but you can either start with an R/RStudio based image and add Jupyter+JupyterHub support or the other way around. In general I think images should be tailored to your use case and have less extraneous components (unless the point of your user environment is to expose people to a broad set of packages). We create our own images for data science courses at Cal so that they have just the right mix of packages.

ablekh commented 6 years ago

@ryanlovett Thank you for additional clarifications. I will further try Rocker approach when I get a chance.

As for the solution for the username suggested by @manics, I have implemented it earlier today, but I am still getting a 500 error. His initial advice was producing some errors, which I was able to figure out (missing import statement and --allow-root parameter for the spawner). However, after all those issues were fixed, the most recent error message looks like the following (note the rsession-related lines). Any thoughts?

[W 2018-10-24 12:05:01.773 SingleUserNotebookApp configurable:168] Config option `open_browser` not recognized by `SingleUserNotebookApp`.  Did you mean `browser`?
[I 2018-10-24 12:05:02.093 SingleUserNotebookApp extension:59] JupyterLab extension loaded from /opt/conda/lib/python3.6/site-packages/jupyterlab
[I 2018-10-24 12:05:02.093 SingleUserNotebookApp extension:60] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2018-10-24 12:05:02.103 SingleUserNotebookApp singleuser:406] Starting jupyterhub-singleuser server version 0.9.4
[I 2018-10-24 12:05:02.112 SingleUserNotebookApp notebookapp:1712] Serving notebooks from local directory: /home/jovyan
[I 2018-10-24 12:05:02.112 SingleUserNotebookApp notebookapp:1712] The Jupyter Notebook is running at:
[I 2018-10-24 12:05:02.113 SingleUserNotebookApp notebookapp:1712] http://(jupyter-ablekh or 127.0.0.1):8888/user/ablekh/
[I 2018-10-24 12:05:02.113 SingleUserNotebookApp notebookapp:1713] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2018-10-24 12:05:04.629 SingleUserNotebookApp log:158] 302 GET /user/ablekh/ -> /user/ablekh/rstudio? (@10.244.0.43) 0.97ms
[I 2018-10-24 12:05:04.864 SingleUserNotebookApp log:158] 302 GET /user/ablekh/?redirects=1 -> /user/ablekh/rstudio?redirects=1 (@10.244.0.1) 0.65ms
[I 2018-10-24 12:05:04.945 SingleUserNotebookApp log:158] 302 GET /user/ablekh/rstudio?redirects=1 -> /hub/api/oauth2/authorize?client_id=jupyterhub-user-ablekh&redirect_uri=%2Fuser%2Fablekh%2Foauth_callback&response_type=code&state=[secret] (@10.244.0.1) 2.49ms
[I 2018-10-24 12:05:05.519 SingleUserNotebookApp auth:875] Logged-in user {'kind': 'user', 'name': 'ablekh', 'admin': True, 'groups': [], 'server': '/user/ablekh/', 'pending': None, 'created': '2018-10-23T17:44:50.775270Z', 'last_activity': '2018-10-24T12:05:05.467083Z', 'servers': None}
[I 2018-10-24 12:05:05.522 SingleUserNotebookApp log:158] 302 GET /user/ablekh/oauth_callback?code=[secret]&state=[secret] -> /user/ablekh/rstudio?redirects=1 (@10.244.0.1) 325.18ms
[I 2018-10-24 12:05:05.584 SingleUserNotebookApp log:158] 302 GET /user/ablekh/rstudio?redirects=1 -> /user/ablekh/rstudio/?redirects=1 (ablekh@10.244.0.1) 1.00ms
[I 2018-10-24 12:05:05.644 SingleUserNotebookApp handlers:439] No existing rsession found
[I 2018-10-24 12:05:05.645 SingleUserNotebookApp handlers:391] Starting process...
[I 2018-10-24 12:05:05.657 SingleUserNotebookApp handlers:385] rsession died with code 0
[I 2018-10-24 12:05:06.656 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:08.059 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:10.019 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:12.767 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:16.613 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:21.998 SingleUserNotebookApp handlers:330] Process exited: rsession
[I 2018-10-24 12:05:29.532 SingleUserNotebookApp handlers:330] Process exited: rsession
[E 2018-10-24 12:05:40.085 SingleUserNotebookApp web:1670] Uncaught exception GET /user/ablekh/rstudio/?redirects=1 (10.244.0.1)
    HTTPServerRequest(protocol='https', host='<FQDN>', method='GET', uri='/user/ablekh/rstudio/?redirects=1', version='HTTP/1.1', remote_ip='10.244.0.1')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/tornado/web.py", line 1592, in _execute
        result = yield result
      File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
        value = future.result()
      File "/opt/conda/lib/python3.6/site-packages/nbserverproxy/handlers.py", line 96, in get
        return await self.http_get(*args, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/nbserverproxy/handlers.py", line 443, in http_get
        return await self.proxy(self.port, path)
      File "/opt/conda/lib/python3.6/site-packages/nbserverproxy/handlers.py", line 420, in proxy
        await self.conditional_start()
      File "/opt/conda/lib/python3.6/site-packages/nbserverproxy/handlers.py", line 440, in conditional_start
        await self.start_process()
      File "/opt/conda/lib/python3.6/site-packages/nbserverproxy/handlers.py", line 407, in start_process
        proc.terminate()
    AttributeError: 'Subprocess' object has no attribute 'terminate'
...
ryanlovett commented 6 years ago

rsession died with code 0

It is hard to tell what happened other than rsession died. You can try to debug this by dropping to a Jupyter terminal and running rserver --www-port={some_num} where some_num is a random TCP port, e.g. 50000. It could fail for any number of reasons depending on how the image was created. The simplest might be if rsession is not in your PATH in which case the terminal would complain command not found.

ablekh commented 6 years ago

@ryanlovett I appreciate your advice. Will try to figure this out after the 2-day workshop (ending today), for which I was creating this particular cluster. The more I think about it, the more this issue looks strange to me. Because the only differences between the failing environment and the working one are the changes described above (based on advice by @manics) - subclassing GitHubOAuthenticator + adding some missing import statements + setting KubeSpawner's allow-root parameter) - all singularly focused on correcting the username and IMO highly unlikely affecting the R environment. Of course, I realize that side effects do happen, but I just don't see them in this case. Or I'm completely missing something ... @minrk Any ideas?

consideRatio commented 5 years ago

An update on how to install RStudio, nbrsessionproxy is now called jupyter-rsession-proxy, and you may want to install that instead of nbressionproxy? I'm not sure. Note that if you do, you uninstall the old nbressionproxy first according to their instructions.

cboettig commented 5 years ago

@ryanlovett et al Should we switch future https://github.com/rocker-org/binder images to jupyter-rsession-proxy ?

choldgraf commented 5 years ago

Hey all - since we have a few cases where RStudio works in JupterHub, can we focus this issue on "where / how to document this" and then close it once the documentation is in place?

If I do a site-search for "RStudio" in the z2jh guide, I don't see any actionable content about how to install RStudio, or links to other guides to install it. Can we insert that in there somewhere? And if so, where would be the best place? Maybe @ryanlovett or @consideRatio know of which resources were the most helpful?

ryanlovett commented 5 years ago

@cboettig Sorry, I missed your earlier mention. Yes, jupyter-rsession-proxy is the way to go and all new development will take place there rather than nbrsessionproxy.

@choldgraf Though jupyter-*-proxy are most useful in a JupyterHub context, getting them into JupyterHub is mostly a matter of getting it into the single user environment. Do you think this should be documented in z2jh's "Customizing User Environment" ?

Fwiw, I think:

choldgraf commented 5 years ago

@ryanlovett yep, basically just what you said + some links to the rsessionproxy docs would probably work. Just enough information so that somebody that searches RStudio would know where to go next

ablekh commented 5 years ago

Do I understand correctly that currently enabling RStudio - JupyterHub integration via jupyter-rsession-proxy is possible only by manually producing a custom single-user Docker image (as in example Dokerfile)?

I think that it would be nice to have Helm chart functionality (correct me if it's already there) that would allow admins to specify arbitrary extra commands (in this case, pip install etc.), allowing to automatically build relevant custom image (on a master node) & push to a target registry for further use by a spawner.

choldgraf commented 5 years ago

@ablekh why not use something like repo2docker for this?

I agree that building in more environment building into JupyterHub could be helpful...though I feel like that might be a complex-enough topic that it'd warrant its own issue separate from RStudio-specifically. What do you think?

ablekh commented 5 years ago

@choldgraf Certainly, repo2docker approach is good, however, it requires some separate manual steps. On the other hand, what I'm suggesting would allow complete automation (assuming that the master node has enough resources for building relevant Docker images once in a while).

As for discussing this in a separate issue - I agree. I just mentioned this idea here to see if I'm not missing something obvious or such functionality already exists (can be achieved via existing Helm chart features).

ryanlovett commented 5 years ago

@ablekh Yes, enabling RStudio integration in JupyterHub does require that jupyter-rsession-proxy or nbrsessionproxy be present in the user environment, along with RStudio itself. I've a tendency to roll my own images, but I'm sure there are or will be public images you can extend or re-use without having to manually produce your own.

@choldgraf I think a separate issue for environment building / repo2docker integration into JupyterHub makes sense. I know you and @yuvipanda have been contemplating the concept for a bit.

ablekh commented 5 years ago

@ryanlovett I understand and agree. In many cases, public images might not be suitable for one reason or another. My thoughts/suggestions above are focused on building and deploying custom images as well. However, the core idea is to make the process more/fully automated (in the CI/CD fashion) in the context of Zero-to-JupyterHub workflow. It would not only save our time, but reduce the amount of potential mistakes.

consideRatio commented 5 years ago

@choldgraf I @minrk described how to install it here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/990#issuecomment-432269851

Notes:

consideRatio commented 5 years ago

@ryanlovett I think it makes sense to have this in customizing user environ of z2jh, but perhaps also / or within the docker-stacks repo as a "Recipe": https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html

mathematicalmichael commented 5 years ago

@consideRatio yes please. this has been one of the hardest things to track down. a working dockerfile no longer cut it for some reason (updates?). Trying out the suggestions in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/990#issuecomment-470299123 (Dockerfile) and if that doesn't work, I'll try minrk's https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/990#issuecomment-432269851

UPDATE: finally got it to work. Weird mixture of issues. This was what did it:

FROM jupyter/r-notebook

RUN python3 -m pip install jupyter-rsession-proxy
RUN cd /tmp/ && \
    git clone --depth 1 https://github.com/jupyterhub/jupyter-server-proxy && \
    cd jupyter-server-proxy/jupyterlab-server-proxy && \
    npm install && npm run build && jupyter labextension link . && \
    npm run build && jupyter lab build

# install rstudio-server
USER root
RUN apt-get update && \
    curl --silent -L --fail https://download2.rstudio.org/rstudio-server-1.1.419-amd64.deb > /tmp/rstudio.deb && \
    echo '24cd11f0405d8372b4168fc9956e0386 /tmp/rstudio.deb' | md5sum -c - && \
    apt-get install -y /tmp/rstudio.deb && \
    rm /tmp/rstudio.deb && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
ENV PATH=$PATH:/usr/lib/rstudio-server/bin
USER $NB_USER
trallard commented 5 years ago

Heya, sorry for invading this issue (since I have no clue where to post here, or at jupyter-rsession-proxy)

Anyway, I have a Jupyterhub deployed on an AKS cluster and I decided to add Rstudio through jupyter-rsession-proxy. I created a Docker image that works just fine. But when using this image on the AKS JupyterHub and try to access user/{myuser}/rstudio/ I keep getting a 500: internal server error could not start rstudio on time.

I am sure I am missing something obvious but I cannot figure out what it is 🤔

Also when looking at the pod logs I only get this: so not super helpful

jupyterhub-deploy_azcli_—_Python_meets_R

Any help would be massively appreciated 🙏🏼

Montereytony commented 5 years ago

A big thank you to all the developers! I have had a few researchers at HaasBerkeley ask for this. I tried for a few days and could not get it to work. I then copied the Dockerfile Eric published above and it worked for me with a few minor changes.

Here is the Dockerfile I used:

FROM jupyter/r-notebook

RUN python3 -m pip install jupyter-rsession-proxy
RUN cd /tmp/ && \
    git clone --depth 1 https://github.com/jupyterhub/jupyter-server-proxy && \
    cd jupyter-server-proxy/jupyterlab-server-proxy && \
    npm install && npm run build && jupyter labextension link . && \
    npm run build && jupyter lab build

USER root
RUN apt-get update && \
    apt-get -y install libssl1.0.0 libssl-dev && \
    cd /lib/x86_64-linux-gnu && ln -s libssl.so.1.0.0 libssl.so.10 &&  ln -s libcrypto.so.1.0.0 libcrypto.so.10  && \
    cd /tmp/ && wget https://download2.rstudio.org/server/trusty/amd64/rstudio-server-1.2.5019-amd64.deb &&\
    apt-get install -y /tmp/rstudio-server-1.2.5019-amd64.deb && \
    rm /tmp/rstudio-server-1.2.5019-amd64.deb && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
ENV PATH=$PATH:/usr/lib/rstudio-server/bin
USER $NB_USER
mathematicalmichael commented 4 years ago

@consideRatio yes please. this has been one of the hardest things to track down. a working dockerfile no longer cut it for some reason (updates?). Trying out the suggestions in #990 (comment) (Dockerfile) and if that doesn't work, I'll try minrk's #990 (comment)

UPDATE: finally got it to work. Weird mixture of issues. This was what did it:

FROM jupyter/r-notebook

RUN python3 -m pip install jupyter-rsession-proxy
RUN cd /tmp/ && \
    git clone --depth 1 https://github.com/jupyterhub/jupyter-server-proxy && \
    cd jupyter-server-proxy/jupyterlab-server-proxy && \
    npm install && npm run build && jupyter labextension link . && \
    npm run build && jupyter lab build

# install rstudio-server
USER root
RUN apt-get update && \
    curl --silent -L --fail https://download2.rstudio.org/rstudio-server-1.1.419-amd64.deb > /tmp/rstudio.deb && \
    echo '24cd11f0405d8372b4168fc9956e0386 /tmp/rstudio.deb' | md5sum -c - && \
    apt-get install -y /tmp/rstudio.deb && \
    rm /tmp/rstudio.deb && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
ENV PATH=$PATH:/usr/lib/rstudio-server/bin
USER $NB_USER

update: This is a minimal set-up Dockerfile that seems to allow for RStudio install. Uses nbrsessionproxy and jupyter-server-proxy (apparently the same pre-reqs allowed streamlit to run via jupyterlab): https://discuss.streamlit.io/t/jupyterhub-streamlit/1238/2

From minimal-notebook, this appeared to be enough: RUN pip install jupyter-server-proxy jupyter-rsession-proxy

(I used to have the development version, as evidenced earlier in the thread, but as of time-of-writing, it appears the pypi version works just fine out of the box).

Note: trailing-slash is important with proxy address, see thread.

scivm commented 4 years ago

Any reason why you are using rstudio-server-1.1.419? I am using the latest rstudio-server-1.2.5033.

mathematicalmichael commented 4 years ago

Any reason why you are using rstudio-server-1.1.419? I am using the latest rstudio-server-1.2.5033.

Just was the working version when I tried it. I’ve since figured out that the pip install version works just fine, no need to install from source

scivm commented 4 years ago

I noticed that after installing from the rstudio server binary and running in jupyterhub that when I try to run a 1 line python script it demands to download unrelated python binaries. The R studio interface also has a pulldown where it shows out of date r modules and lets the user update them into the docker image which would be lost after their server restarts. Im hoping to have happy users but still block outbound internet and cran connection. Wondering if others had a similar experience.

When running image from this dockerfile I also have a shiny in the jupyter pulldown and an icon in lab that gives a 500 error. Not sure how to remove that yet.

consideRatio commented 4 years ago

@scivm How do you notice this? I want to make sure it doesnt happen me as well, so understanding if this was a background process or something obvious is relevant for me to know if i may be affected as well for example.

koners commented 4 years ago

@ablekh: Did you by any chance ever figured out how to use a rocker image through Jupyter hub. I am in the same boat as you and looking for some help.

I'm trying to create a separate JupyterHub cluster for an upcoming workshop that requires using RStudio sessions rather than using R kernel in JupyterHub. Since a proper multi-user RStudio setup can currently only be implemented via commercial version of RStudio Server and because I want to take advantage from JupyterHub's convenient authentication mechanisms (and container-based session isolation), I was working hard to setup a z2jh-based JupyterHub cluster similarly to how Binder enables such setup.

It was my understanding that I could just specify desired version of RStudio-based Docker container image in relevant config.yaml without any other changes. I have done just that, selecting the Rocker distribution, specifically the rocker/verse image (to support LaTeX etc.). However, when, after fixing secondary issues, I tried to spawn a single-user container, I was greeted with the following error message on the progress page:

2018-10-23 10:44:04+00:00 [Warning] Error: failed to start container "notebook": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"jupyterhub-singleuser\": executable file not found in $PATH"

After seeing this, I started thinking that, perhaps, the standard Rocker Docker images (from DockerHub) are not JupyterHub-compatible. If I'm correct on this, then I think that I could build my own relevant image from the source (https://github.com/rocker-org/rocker-versioned/tree/master/verse, I assume; though not clear whether I need to into a specific version-dir), using jupyter-repo2docker as suggested by @cboettig here. Any help and/or advice will be much appreciated.

ablekh commented 4 years ago

@koners No, I haven't had a chance to further explore the Rocker images route - the priorities have been too dynamic :-). However, I have successfully used the approach suggested by @minrk above.

cboettig commented 4 years ago

@koners @ablekh For folks looking to use JupyterHub on Rocker images (to access RStudio or Juypter notebook instances) we recommend the rocker/binder image (not the rocker/verse image mentioned above), see https://github.com/rocker-org/binder. (It's based on rocker/verse but does the setup for you).

Also, open source RStudio server (e.g. in rocker/studio) supports multiple users just fine as separate linux account users. (The commercial product I think supports multiple users on the same R session, google docs style).

(apologies I must have missed this thread when I was originally tagged so catching up a bit now!)

ablekh commented 4 years ago

@cboettig Thank you very much for clarifying this (and please don't worry about missing the thread). I assume that the pro of this approach (vs. the one suggested above by @minrk) is that, in this case, we don't have to manually maintain RStudio versions and the con is that the resulting image would contain some geo-focused packages. Correct?

BTW, have you tested this approach in z2jh environment? I'm somewhat suspicious that z2jh, as a K8s-based JupyterHub setup, might introduce some potential issues. Unfortunately, currently I don't have any available cloud resources to test and confirm or deny my suspicion, but I would be curious to hear the results of relevant testing, should you and/or @koners have such ability.