rstudio / helm

Helm Resources for RStudio Products
MIT License
34 stars 28 forks source link

RSW: OIDC auth with replica >=2 #165

Closed pat-s closed 2 years ago

pat-s commented 2 years ago

When increasing replicas, we face issues with OIDC auth (which works fine with replica = 1):

2022-03-02T15:37:10.117201Z [rserver] ERROR OpenID failed with error: attempt to use invalid state: 041fc65df50a6e8b7390213efac495b3f125b9e1d8b43df1b1df8c9b163da162; LOGGED FROM: void rstudio::server::openid_auth::{anonymous}::writeResponse(rstudio_boost::shared_ptr<rstudio::core::http::AsyncConnection>, const rstudio::core::http::Response&) src/cpp/server/openid_auth/ServerOpenIDAuth.cpp:145

We're using a Postgres DB and besides the DB settings we have set

config:
  server:
    rserver.conf:
      server-shared-storage-path: /home/shared-storage
secureCookieKey: <some key>

ingress:
  enabled: true
  ingressClassName: "nginx"
  annotation:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "RSW-SESSION-COOKIE"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
    nginx.ingress.kubernetes.io/affinity-mode: persistent
    nginx.ingress.kubernetes.io/session-cookie-hash: sha1

Is there anything additional which needs to be configured to get OIDC auth working in HA configuration?

colearendt commented 2 years ago

There shouldn't be! The key there is the sticky sessions / cookies.

Basically, the node that initiates the OIDC handshake also needs to be the node that finishes it, because our handshake state is stored in memory. So the place where the user starts needs to be the place where they end. This is something we have on the backlog to improve, but that should be enough to get things working for you!

I take it you have that YAML above applied and it's still not working?

pat-s commented 2 years ago

There shouldn't be! The key there is the sticky sessions / cookies.

Sounds promising!

I take it you have that YAML above applied and it's still not working?

Jup, with this configuration switiching between replicas: 2 and replicas: 1 yields the issue described in OP. I'll have another look though and will focus on the cookie, maybe there is/was a glitch somewhere.

PS: The same setup works for RSC when using two more more replicas, so it's not unlikely that the issue on our side.

pat-s commented 2 years ago

So first I needed to recreate the ingress manually as it did not pick up the cookie changes from the terraform deployment.

Now I am facing a new issue but it seems different. I also saw #137 and checked that /mnt/load-balancer/rstudio/load-balancer exists (which the logs also confirm). The line ERROR asio.netdb error 1 (Host not found (authoritative) does not look so healthy though - could this be an issue?

Are you maybe able to spot any other potentially problematic log messages?

2022-03-04T21:07:59.764354Z [rserver] INFO Creating database connection pool of size 2 (source: logical CPU count)
2022-03-04T21:07:59.764604Z [rserver] WARNING A plain text value is potentially being used for the PostgreSQL password, or an encrypted password could not be decrypted. The RStudio Server documentation for PostgreSQL shows how to encrypt this value.; LOGGED FROM: rstudio::core::Error rstudio::core::database::ConnectVisitor::getPassword(const rstudio::core::database::PostgresqlConnectionOptions&, std::__cxx11::string&) const src/cpp/core/Database.cpp:410
2022-03-04T21:07:59.820262Z [rserver] INFO Database schema version is up to date.
2022-03-04T21:07:59.821030Z [rserver] INFO No environment variables 'env-vars' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:07:59.821881Z [rserver] INFO No secure key 'session-rpc-key' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:07:59.822927Z [rserver] INFO Starting floating license manager license-manager
2022-03-04T21:07:59.824041Z [rserver] INFO Reading load balancing configuration from '/mnt/load-balancer/rstudio/load-balancer'
2022-03-04T21:07:59.840470Z [rserver] INFO www-host-name was not provided - attempting to determine the host for this node.
2022-03-04T21:07:59.854775Z [rserver] INFO Retrieved default address from hostname syscall at rsw-74498dfbf7-tkc29:8787
2022-03-04T21:07:59.863799Z [rserver] INFO www-host-name was not provided - attempting to determine the host for this node.
2022-03-04T21:07:59.863950Z [rserver] INFO Retrieved default address from hostname syscall at rsw-74498dfbf7-tkc29:8787
2022-03-04T21:07:59.868436Z [rserver] INFO Reading secure cookie key from '/mnt/secret-configmap/rstudio/secure-cookie-key'
2022-03-04T21:07:59.877483Z [rserver] INFO No secure key 'session-rpc-key' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:07:59.991309Z [rserver] INFO No IP access restrictions 'ip-rules' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:07:59.996434Z [rserver] INFO Reading OpenID secret from '/mnt/secret-configmap/rstudio/openid-client-secret'
2022-03-04T21:07:59.996898Z [rserver] WARNING An unencrypted value is potentially being used for the OpenID client secret. The RStudio Server documentation for OpenID shows how to encrypt this value.; LOGGED FROM: rstudio::server::openid_auth::{anonymous}::ClientSecret& rstudio::server::openid_auth::{anonymous}::clientSecret() src/cpp/server/openid_auth/ServerOpenIDAuth.cpp:80
2022-03-04T21:08:00.061229Z [rserver] INFO rserver-openid OpenID Version 0.4.1
2022-03-04T21:08:00.061461Z [rserver] INFO Reading server user profiles from '/mnt/configmap/rstudio/profiles'
2022-03-04T21:08:02.654680Z [rserver] INFO No R version metadata 'r-versions' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:08:13.160525Z [rserver] INFO Reading launcher private key from '/mnt/secret-configmap/rstudio/launcher.pem'
2022-03-04T21:08:13.160582Z [rserver] INFO Reading launcher public key from '/mnt/dynamic/rstudio/launcher.pub'
2022-03-04T21:08:13.160759Z [rserver] INFO Reading job launcher mounts from '/mnt/configmap/rstudio/launcher-mounts'
2022-03-04T21:08:13.160957Z [rserver] INFO No job launcher environment 'launcher-env' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:08:13.161029Z [rserver] INFO No job launcher ports 'launcher-ports' found in XDG_CONFIG_DIRS, expected in an 'rstudio' folder in one of '/mnt/dynamic:/mnt/session-configmap:/mnt/secret-configmap:/mnt/configmap:/mnt/load-balancer/'
2022-03-04T21:08:13.162672Z [rserver] INFO Reading Jupyter configuration from '/mnt/configmap/rstudio/jupyter.conf'
2022-03-04T21:08:16.255106Z [rserver] INFO Detected JupyterLab version 3.2.9
2022-03-04T21:08:19.356939Z [rserver] INFO Detected Jupyter Notebook version 6.4.8
2022-03-04T21:08:19.357196Z [rserver] INFO Reading VS Code configuration from '/mnt/configmap/rstudio/vscode.conf'
2022-03-04T21:08:21.475915Z [rserver] INFO Detected VSCode code-server version 4.0.2
2022-03-04T21:08:21.555552Z [rserver] ERROR system error 2 (No such file or directory); OCCURRED AT void rstudio::core::http::LocalStreamAsyncClient::handleConnect(const rstudio_boost::system::error_code&) src/cpp/server/ServerSessionProxy.cpp:124

2022-03-04T21:08:22.061294Z [rserver] INFO Emit node update distributed event from 9 (rsw-74498dfbf7-tkc29): Online
colearendt commented 2 years ago

That all looks good to me! Do things seem to be functioning properly?

There are a handful of "ERROR"s that are more transient and not indicative of underlying issues (and which we should do a better job of softening 😅 ). I think that "File not found" you're seeing and the "Host not found" both can fall into that category (i.e. if it was issued while the cluster was stabilizing between nodes exiting the cluster / etc.). You can check that nodes are all there and recognized (IIRC) from inside the container with rstudio-server list-nodes or something along those lines.

pat-s commented 2 years ago

Before I went to bed, it did not work and I was stuck after trying to clear any caches, session states etc. (also in our self-hosted auth provider).

While drining my morning ☕ I tried again, and it worked straight 🙈 Classic...

Thanks for your help!

PS: My takeaway here is that making changes to an ingress are not necessarily deployed instantly (which was the initial issue I guess) but manual deletion is needed (which might be a terraform thing and is surely out of scope WRT to this chart)

pat-s commented 2 years ago

Just a quick follow-up on this.

The reason why the ingress was not updated automatically is because I had a typo. It should be annotations instead of annotation 🤦 The latter is just silently ignored.