rstudio / helm

Helm Resources for RStudio Products
MIT License
36 stars 28 forks source link

[CONNECT] run as non-root: launcher service account not found #385

Open mr-miles opened 1 year ago

mr-miles commented 1 year ago

Our kubernetes environment only permits non-root, read-only containers and I was experimenting to see if I could get connect with launcher to run successfully like this. It looks like I succeeded but the launcher is complaining about not finding the default service account in the job namespace (it definitely exists) - can you shed any light on what it needs? Is there some part of the setup that I am missing? I am very excited about getting the off-host execution working

Also, it would be great if the non-root setup could be incorporated into the helm chart! I had to bypass the prestart script (which tries to run update-ca-certificates and requires being root), but it could just read the root ca from the filesystem and set the KUBERNETES_CA_CERT_DATA environment variable. Even better, the cert is already on the file system so maybe it could be picked up from there rather than having to be specified.

The helm chart values file looks like this:

sharedStorage:
  mount: true
  name: rsc-pvc

prometheusExporter:
  enabled: false

launcher:
  enabled: true
  namespace: rsc-job

args:
  - /usr/local/bin/startup.sh

config:

  Database:
    Provider: "Postgres"
  Postgres:
    URL: "postgres://<host>/connectdb"

  Launcher:
    DataDirPVCName: rsc-pvc
    KubernetesCACertificateData: <base-64-encoded-cert-data>

  Metrics:
    Enabled: false

pod:
  volumeMounts:
    - name: tmp
      mountPath: "/tmp"
    - name: bind-mount
      mountPath: "/opt/rstudio-connect/mnt"
    - name: launcher-conf
      mountPath: "/etc/rstudio-connect/launcher"
    - name: launcher-scratch
      mountPath: "/var/lib/rstudio-connect-launcher"
  volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 1Gi
    - name: bind-mount
      emptyDir:
        sizeLimit: 1Gi
    - name: launcher-conf
      emptyDir:
        sizeLimit: 1Gi
    - name: launcher-scratch
      emptyDir:
        sizeLimit: 1Gi
  securityContext:
    seccompProfile:
      type: RuntimeDefault

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  privileged: false
  runAsNonRoot: true
  capabilities:
    drop:
    - ALL  
  runAsUser: 999
  runAsGroup: 999

Console log runs through the config tasks fine and ends like this:

time="2023-06-01T19:59:39.852Z" level=info msg="Configuration tasks: complete"
time="2023-06-01T20:00:12.988Z" level=info msg="Using the normalized Server.Address: http://localhost:3939/"
time="2023-06-01T20:00:13.366Z" level=info msg="Waiting until launcher is started..."
time="2023-06-01T20:01:12.366Z" level=info msg="Starting the launcher..."
time="2023-06-01T20:01:12.377Z" level=info msg="2023-06-01T20:01:12.371167Z [rstudio-launcher] ERROR system error 1 (Operation not permitted) [path: /var/lib/rstudio-connect-launcher, description: Could not change permissions on path /var/lib/rstudio-connect-launcher. Is root squash enabled?]; OCCURRED AT rstudio::core::Error rstudio::core::{anonymous}::changeFileModeImpl(const string&, mode_t) src/cpp/shared_core/FilePath.cpp:317; LOGGED FROM: rstudio::core::Error rstudio::job_launcher::Config::impl::validate() src/cpp/job_launcher/LauncherConfig.cpp:406" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.377Z" level=info msg="2023-06-01T20:01:12.371299Z [rstudio-launcher] INFO Running in unprivileged single-user mode" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.377Z" level=info msg="2023-06-01T20:01:12.372208Z [rstudio-launcher] INFO Bootstrapping plugin Kubernetes" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.384Z" level=info msg="2023-06-01T20:01:12.384261Z [rstudio-kubernetes-launcher, log-source: Kubernetes] WARNING Could not set permissions on scratch path (/var/lib/rstudio-connect-launcher/Kubernetes)- it is recommended to set them to rwxr-xr-x; LOGGED FROM: rstudio::core::ProgramStatus rstudio::job_launcher::impls::entry_point::initialize(int, char**, const string&, rstudio::job_launcher::impls::OptionsBase*, rstudio_boost::shared_ptr<rstudio::job_launcher::impls::FrameworkCommunicator>&) src/cpp/job_launcher/impls/EntryPoint.cpp:207" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.431Z" level=info msg="2023-06-01T20:01:12.431274Z [rstudio-kubernetes-launcher, log-source: Kubernetes] INFO Permission to list nodes denied by the Kubernetes API server. This is fine in most cases, but IP addresses for jobs with a NodePort will reflect only the cluster's own IP range -- which may not be visible to an external service. Consider adding a nodes' ClusterRole for the ServiceAccount if network connectivity from outside the cluster is an issue." stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.488Z" level=info msg="2023-06-01T20:01:12.482557Z [rstudio-kubernetes-launcher, log-source: Kubernetes] INFO Pruning 0 jobs" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.493Z" level=info msg="2023-06-01T20:01:12.492615Z [rstudio-launcher] INFO Initializing SslAsyncServer [address: 0.0.0.0, port: 5559]" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.504Z" level=info msg="2023-06-01T20:01:12.504260Z [rstudio-launcher] INFO Running server" stream=stderr subprocess=rstudio-launcher
time="2023-06-01T20:01:12.515Z" level=info msg="Launcher started."
time="2023-06-01T20:01:12.522Z" level=fatal msg="Error: Cannot setup launcher service accounts: Launcher is unable to find configured global default service account: default in namespace: rsc-job"
colearendt commented 1 year ago

Interesting!! Unfortunately, we do not expect that running Connect in "rootless, read-only filesystem" mode will work today 😞 That is definitely a direction that we are heading though! Just lots of work to resolve before we get there.

However, it looks like you have definitely made some meaningful progress! Well done!

Are you using the release-candidate chart with the --devel flag? This SA error message should be resolved by the RBAC that the RC candidate produces - I'm surprised to see it cropping up here if you are!

mr-miles commented 1 year ago

Thanks! I have done some more digging and even if I run it as a privileged container, it complains if I set the launcher namespace to anything other than the connect namespace.

With connect+launcher in the same namespace though, it appears to be working!

Deploying connect server to the namespace "rsc" via helm, this works:

launcher:
  enabled: true

but this does not:

launcher:
  enabled: true
  namespace: rsc-job

The namespace appears to be set correctly in the config at /var/lib/rstudio-connect-launcher/Kubernetes:

server-user=rstudio-connect
scratch-path=/var/lib/rstudio-connect-launcher/Kubernetes
profile-config=/etc/rstudio-connect/launcher/launcher.kubernetes.profiles.conf
kubernetes-namespace=rsc-job
verify-ssl-certs=1
certificate-authority=<b64-cert>
use-templating=1
job-expiry-hours=0.25

In the second (non-working) case, I opened a shell in the container and tested that the RBAC for the pod was set up to enable listing service accounts:

TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) curl https://172.20.0.1/api/v1/namespaces/rsc-job/serviceaccounts -H "Authorization: Bearer $TOKEN" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

{
  "kind": "ServiceAccountList",
  "apiVersion": "v1",
  "metadata": {
    "resourceVersion": "21703890"
  },
  "items": [
    {
      "metadata": {
        "name": "default",
        "namespace": "rsc-job",
        "uid": "2b6d7eb0-b9d8-4f93-947d-f10ccd2a8c0b",
        "resourceVersion": "17399132",
        "creationTimestamp": "2023-05-31T16:47:53Z"
      }
    }
  ]

Is it possible there's been a regression for the case where the jobs are launched in a different namespace?