rocker-org / rocker-versioned2

Run current & prior versions of R using docker. rocker/r-ver, rocker/rstudio, rocker/shiny, rocker/tidyverse, and so on.
https://rocker-project.org
GNU General Public License v2.0
390 stars 163 forks source link

Running RStudio behind an nginx proxy in a kubrenetes cluster #757

Closed perllaghu closed 5 months ago

perllaghu commented 5 months ago

Container image name

No response

Container image digest

No response

What operating system related to this question?

No response

System information

No response

Question

I think this is a variation of the Path prefix does not work as expected with RStudio server and Traefik issue.

We are running services in a k8 cluster, and use an nginx proxy to route users: users get a UUID-based path & the router proxy-passes them to their individual server.

The normal routing path is external url -> k8 ingress rule catches /user/ -> nginx service proxy-pass based on location -> specific pod

I cannot get a clean response from an RStudio server, accessing it through router

All user-servers are named jupyter-<uuid>

The nginx has a Location definition thus:

  location ~ /user/([a-z0-9]+)/rocker {
    proxy_hide_header Content-Security-Policy;
    proxy_pass http://jupyter-$1.${POD_NAMESPACE}.svc.cluster.local:8787;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header Host jupyter-$1:8787;
    client_max_body_size 0;

    proxy_set_header X-Forwarded-Proto $proto;
    proxy_set_header X-Forwarded-Port $port;
    add_header X-Clacks-Overhead "Rocker" always;
    proxy_set_header User-Agent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0";
  }

[ ${POD_NAMESPACE} ensures that it references the pod in my namespace and doesn't magically connect to someone elses service; $proto & $port are for http/https & 443/80; $1 is the UUID specific to the user. X-Clacks-Overhead lets me confirm that this location definition is used ]

If I work from the router container in the cluster [so bypassing any possible contamination from external factors] - I can get a named file:

I've tried various combinations of X-RStudio-Root-Path & WWW_ROOT_PATH.... and I've essentially run myself in circles - and frazzled the brain of a colleague who tried to help :)

As far as I can tell, the RStudio server dynamically creates pages [my witness is the page not found page], and creates hrefs to images & the like... based on www-root-path - however the page not found page seems to double-up:

var baseUri = "http://<container-service-name>:8787/user/<uuid>/rocker/<www-data-route>/rstudio.css"

... which is obviously wrong!

So - has anyone got a working RStudio-behind-nginx config that works?

cboettig commented 5 months ago

@perllaghu apologies as I don't completely follow your setup here so not sure how to debug.

A bit of a tangent maybe worth mentioning -- On k8s, I have recently started running rocker images through via jupyterhub (requires the jupyterhub python module be installed, e.g. rocker/binder or a RUN /rocker_scripts/install_jupyter.sh in any custom Dockerfile. It is easy to adjust your helm chart values to have users launch directly into the RStudio server instance or go the Jupyterhub launcher, from which they can select either RStudio or other interfaces (JupyterLab, VSCode, desktop interfaces, etc). The docs & community support are excellent, so it's easy to configure things like https with traefik or authentication via github or other providers. https://z2jh.jupyter.org/en/stable/ . It's possible to configure so that users can bring their own rocker-derived images as well. Not sure if that's an option for you, but it might be an easy way to get going with Rocker images on K8s.

perllaghu commented 5 months ago

@cboettig you absolutely can suggest the jupyter plugins... in effect the rocker/binder docker image. I actually started there, however I couldn't figure out how to make it start directly in the RStudio UI.

We switched away from Juyterhub several years ago: when it hiccups, it looses track of its routing & takes ages to figure it out.... and that's a problem with 400+ simultaneous connections [hence people switching to path-based routing]

When it comes to starting these user-services, we dynamically create the pod/service/deployment specs - so tips on getting the binder image to start directly in RStudio greatly received.

cboettig commented 5 months ago

@perllaghu to start jupyterhub directly into RStudio just set the default_url to /rstudio as per https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner.default_url . I think this trick also works in the binder image even if you're not using the k8s-based jupyterhub, you can set the ?urlpath=rstudio parameter (as per https://github.com/rocker-org/binder?tab=readme-ov-file#3-modify-the-binder-badge-in-the-readmemd)

At Berkeley we regularly have ~ 10,000 simultaneous connections on the jupyterhub so I don't think 400 should be scaling problem for the software; but I'm not involved directly in any of that.

perllaghu commented 5 months ago

@cboettig thankee for this. The jupyterhub route is, as you say, an alternative.... however, for clarity - it would be good to get an answer that doesn't require adding another technology layer [for those not already in the jupyter ecosystem]

cboettig commented 5 months ago

yup, 100%, maybe others have experience with this here, but I'd also reach out to the RStudio team.