jupyterhub / jupyter-rsession-proxy

Jupyter extensions for running an RStudio rsession proxy
BSD 3-Clause "New" or "Revised" License
118 stars 87 forks source link

Add --www-root-path option to rserver #95

Closed ryanlovett closed 3 years ago

ryanlovett commented 3 years ago

RStudio devs have suggested in https://github.com/rstudio/rstudio/pull/7628 that RStudio 1.4 requires that proxies need rserver's --www-root-path. We can do this via a configurable or and environment variable.

cboettig commented 3 years ago

Thanks Ryan!

On our end, we're just adding binder to our Docker images via this script: https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_binder.sh (currently it is installing an outdated RStudio to work around the issue), and then the binder images set the default CMD as:

jupyter notebook --ip 0.0.0.0 

https://github.com/rocker-org/rocker-versioned2/blob/master/dockerfiles/Dockerfile_binder_4.0.3#L15.

It would be great if we could set something like RSERVER_WWW_ROOT_PATH or such in the image and have this all magically work, but I'm not quite clear on what the syntax would be for configurable to pick that up?

ryanlovett commented 3 years ago

It seems that setting the option to just /rstudio/ works on localhost, but not on jupyterhub or binder. We need to set the full server path, e.g. /user/{username}/rstudio/.

ryanlovett commented 3 years ago

With www root path set and when visiting rserver directly via curl, it issues a Location header to the full server URL with /auth-sign-in?appUri=%2F at the end, e.g. /user/{username}/rstudio/auth-sign-in?appUri=%2F. However when visiting via proxy, we see a Location header with only /auth-sign-in?appUri=%2F. Not sure why the rest of the path is not present. I think I need to debug jupyter-server-proxy.

If one manually visits the correct auth sign in URL (again with the root path set), we receive the cookie and are properly redirected to a working RStudio.

ryanlovett commented 3 years ago

I've tracked this down to the header X-Forwarded-Proto: https,http.

All of this is when we set --www-root-path.

I need to find out what the header is for before possibly reporting it upstream.

manics commented 3 years ago

The X-Forwarded-* headers are often added by load-balancers to indicate the original request when proxying through to a backend. For example, you might have https for external traffic but http for internal:

user <-> https://external-load-balancer <-> http://internal-jupyter

In this example external-load-balancer would add the header X-Forwarded-Proto: https when forwarding to internal-jupyter.

If you have multiple proxies you can end up with multiple protocols as the header value https,http.

This has caused problems in JupyterHub due to limitations in Tornado which doesn't pass through the full header value, instead it picks just one of the protocols which may not be the one Jupyter wants. For example, see:

ryanlovett commented 3 years ago

Thanks @manics ! I was testing this on a hub behind an ssl-terminating nginx instance, and it turns out that nginx is setting that header.

I think in this case, rserver should preserve the full value of the Location header regardless of what X-Forwarded-Proto is set to. Ideally it would actually not insert the protocol or server name either, e.g. /user/{username}/rstudio/auth-sign-in?appUri=%2F. But you've got me thinking that I need to double-check all of the headers I'm seeing before I report this.

manics commented 3 years ago

Good luck figuring it out! I've spend many hours digging into these sorts of problems in the past....

vnijs commented 3 years ago

@cboettig What do you think are the changes that Rstudio might be willing to provide some support to figure this out? I would be really great to be able to use Rstudio 1.4 with Jupyter and jupyterhub.

ryanlovett commented 3 years ago

@vnijs Even after adding --www-root-path (and path) and a couple of other options, one of the proxied http headers causes rstudio to send the client to the wrong place. I almost filed an issue with rstudio but wanted to triple-check that either the proxy wasn't doing something wrong or that it could work around the problem. Sorry for letting this drop. I'll try to pick it up again soon.

blairdrummond commented 3 years ago

Hey @ryanlovett @vnijs ; funny timing. I got 1.4 running here:

https://github.com/StatCan/kubeflow-containers/pull/195

I added the www-root-path to the conf file directly in the dockerfile, and also had to deal with an ownership issue on some sqlite database. The changes on the proxy side are here:

https://github.com/blairdrummond/jupyter-rsession-proxy/commit/eb5c4b7023f82ab7841d55fef54c56ebe7a51705

And then the dockerfile side is here:

https://github.com/StatCan/kubeflow-containers/pull/195/commits/aa2000f5f2a814bdb783377efb2798d1b21a221a

ryanlovett commented 3 years ago

Thanks @blairdrummond! I think if you try that on mybinder or any service where a proxy is setting X-Forwarded-Proto, you may run into trouble. At least that's what I found with something similar to your patch and 1.4. Are you able to reproduce?

blairdrummond commented 3 years ago

Ah crap, you're totally right

Screenshot from 2021-01-24 08-12-29

manics commented 3 years ago

Have you tried playing with the absolute_url option in jupyter-server-proxy? https://jupyter-server-proxy.readthedocs.io/en/latest/server-process.html

If you want to check the underlying code: https://github.com/jupyterhub/jupyter-server-proxy/blob/81a8e75ace301d562c997916ef3694d57c49d365/jupyter_server_proxy/handlers.py#L133-L138 https://github.com/jupyterhub/jupyter-server-proxy/blob/81a8e75ace301d562c997916ef3694d57c49d365/jupyter_server_proxy/handlers.py#L172-L179

blairdrummond commented 3 years ago

@ryanlovett I'm not sure what the best mechanism is for this, but hacked this together and it does seem to be behaving:

https://github.com/blairdrummond/jupyter-rsession-proxy/commit/a65a984cad5b9b250eeafd20803d9eb60f81429d

Where RSTUDIO_WWW_ROOT_PATH=/notebook/blair-drummond/rstudiotest/rstudio, for example. ($NB_PREFIX/rstudio on Kubeflow)

Screenshot from 2021-01-24 13-52-50

blairdrummond commented 3 years ago

@ryanlovett have you seen the header mentioned here?

https://github.com/kubeflow/kubeflow/issues/2208#issuecomment-702696927

ryanlovett commented 3 years ago

@blairdrummond I believe setting --www-root-path and the other options is not sufficient for 1.4 to work. However I did try your patch in a local docker container and on binder with a recent 1.4 build and it didn't work for me. Do you have a reproducible environment I could test?

Here is what I tried: https://github.com/ryanlovett/binder-rsession https://mybinder.org/v2/gh/ryanlovett/binder-rsession/HEAD

ryanlovett commented 3 years ago

Regarding X-Forwarded-Proto in https://github.com/jupyterhub/jupyter-rsession-proxy/issues/95#issuecomment-730646393, I considered altering this header value to avoid the problematic behavior, but this could turn into whack-a-mole. For example we could have jupyter-server-proxy set the header to http (since that is how it is being served here), but a later version of RStudio my decide it doesn't want to be served that way. If X-F-P was always http, RStudio would think that the client wasn't communicating over https and issue a redirect to the https endpoint. This would cause a loop.

So I'm thinking that the right way forward is to discuss the X-F-P behavior with RStudio.

Also, I'm aware that I have turned my issue into "fix all 1.4 related proxy things" which I shouldn't have. Sorry about that. I'll commit either my or @blairdrummond's root path patches and then create a new issue for X-Forwarded-Proto.

cboettig commented 3 years ago

Thanks much @ryanlovett . Definitely support touching base with the RStudio team on this. As you know they've always been very responsive in the past!

ryanlovett commented 3 years ago

Created #96 to address the root path issue. But again, I don't think this is enough for the extension to work with RStudio 1.4 in all places, based on the X-Forwarded-Proto issue.

ryanlovett commented 3 years ago

I've created https://github.com/rstudio/rstudio/issues/8888 which is a very appropriate number for a jupyter-related issue!

ryanlovett commented 3 years ago

I'm wondering if it'd be simpler to just immediately send the user to /rstudio/auth-sign-in?appUri=%2F without waiting for RStudio to do it for us. I don't know what the downsides are of visiting this more than once, but it doesn't seem to hurt.

Robinlovelace commented 3 years ago

Just catching up on this and wanted to say huge thanks @ryanlovett and everyone involved in getting Binder/RStudio to play. If you want any testing or help in any way I'm keen, although this is a bit over my head. Will greatly help people trying to organise teaching sessions in reproducible and accessible environments for people with diverse backgrounds.

ryanlovett commented 3 years ago

Fixed by #96, but we still have the X-Forward-Proto issue in #97.

ryanlovett commented 3 years ago

Oh, and just to underscore a new note in the docs:

If you want to use RStudio 1.4.x, you need to set the RSESSION_PROXY_RSTUDIO_1_4 environment variable. This adds additional parameters to the execution of rserver.

vnijs commented 3 years ago

Thanks @ryanlovett! Question: What should you set RSESSION_PROXY_RSTUDIO_1_4 to? A path? True?

ryanlovett commented 3 years ago

@vnijs Sorry, you can just set it to be anything, e.g. 1, yes, no, false. It just can't be the empty string.

ryanlovett commented 3 years ago

@manics absolute_url=True has no effect on the auth issue unfortunately. :(

ShichenXie commented 3 years ago

@vnijs Sorry, you can just set it to be anything, e.g. 1, yes, no, false. It just can't be the empty string.

Thanks. I have a question, where can we set the environment variable RSESSION_PROXY_RSTUDIO_1_4? Insert the following line into the dockerfile? ENV RSESSION_PROXY_RSTUDIO_1_4=yes

ryanlovett commented 3 years ago

@ShichenXie Yep, that'd work.