nextcloud / helm

A community maintained helm chart for deploying Nextcloud on Kubernetes.
GNU Affero General Public License v3.0
331 stars 269 forks source link

nextcloud-nginx container crashlooping after securityContext update; `/var/www/html/config` always owned by root #335

Open jessebot opened 1 year ago

jessebot commented 1 year ago

Description

I've edited this for full context of how we got here, as this issue is getting kind of long, because it needed to be tested in a lot of different ways which lead me in several directions.

This issue is a continuation of the conversation started after #269 was merged. I was originally trying to changed the podSecurityContext.runAsUser and podSecurityContext.runAsGroup to 33 because I was trying to diagnose why the /var/www/html/config directory was always owned by root. I am deploying the nextcloud helm chart using persistent volumes on k3s with the default local path provisioner.

I learned that the podSecurityContext.fsGroup was always being set to 82 anytime you used nginx.enabled and didn't set podSecurityContext.fsGroup explicitly, so I submitted a draft PR here to fix it to so that it checks image.flavor for alpine: https://github.com/nextcloud/helm/pull/379

Through the comments here you can see other things I'm currently testing, because I'm still not sure is it's just the local path provisioner on k3s or k3s itself or what, but the best I can get is 🀷 I'll update this issue description with more clarity as it comes.

Original Issue that was opened on Jan 23

The nginx container in the nextcloud pod won't start and complains of a readonly file system, even if I try to only set the nextcloud.securityContext.

I have created a new cluster and deployed nextcloud with the securityContext parameters from the values.yaml of this repo, including the nginx security context. My entire values.yaml is here, but the parts that matter are:

securityContext parameters in my old `values.yaml` ```yaml nextcloud: # securityContext parameters. For example you may need to define runAsNonRoot directive securityContext: runAsUser: 33 runAsGroup: 33 runAsNonRoot: true readOnlyRootFilesystem: false # securityContext parameters. For example you may need to define runAsNonRoot directive podSecurityContext: runAsUser: 33 runAsGroup: 33 runAsNonRoot: true readOnlyRootFilesystem: false ... nginx: ## You need to set an fpm version of the image for nextcloud if you want to use nginx! enabled: true image: repository: nginx tag: alpine pullPolicy: Always # this is copied almost directly from the values.yaml, but I changed readOnlyRootFilesystem to false while testing securityContext: runAsUser: 82 runAsGroup: 33 runAsNonRoot: true readOnlyRootFilesystem: false ```

The nextcloud pod is in a crashloopbackoff state with the offending container being nginx, and this being the logs:

β”‚ 2023-01-23T15:44:59.428798413+01:00 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration        β”‚
β”‚ 2023-01-23T15:44:59.428820874+01:00 /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/                               β”‚
β”‚ 2023-01-23T15:44:59.429173908+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh                   β”‚
β”‚ 2023-01-23T15:44:59.429979412+01:00 10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file sy β”‚
β”‚ stem?)                                                                                                                                      β”‚
β”‚ 2023-01-23T15:44:59.430071429+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh                       β”‚
β”‚ 2023-01-23T15:44:59.431167356+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh                       β”‚
β”‚ 2023-01-23T15:44:59.431715519+01:00 /docker-entrypoint.sh: Configuration complete; ready for start up                                       β”‚
β”‚ 2023-01-23T15:44:59.433513935+01:00 2023/01/23 14:44:59 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)  β”‚
β”‚ 2023-01-23T15:44:59.433519229+01:00 nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)                    β”‚
β”‚ 2023-01-23T14:45:24.296336176Z Stream closed EOF for nextcloud/nextcloud-web-app-66fc5dfcb7-kxlnp (nextcloud-nginx)

This is the resulting deployment.yaml when I do a kubectl get deployment -n nextcloud nextcloud-web-app > deployment.yaml:

Click me for the nextcloud deployment.yaml ```yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" creationTimestamp: "2023-01-23T14:43:58Z" generation: 52 labels: app.kubernetes.io/component: app app.kubernetes.io/instance: nextcloud-web-app app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: nextcloud argocd.argoproj.io/instance: nextcloud-web-app helm.sh/chart: nextcloud-3.4.1 name: nextcloud-web-app namespace: nextcloud resourceVersion: "3340" uid: cde1dd07-103a-4c04-931d-071ab3c5b448 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: app app.kubernetes.io/instance: nextcloud-web-app app.kubernetes.io/name: nextcloud strategy: type: Recreate template: metadata: annotations: nextcloud-config-hash: d1d9ac6f86f643b460f8e4e8e886b65382ad49aede8762f8ea74ccd86b7e3f28 nginx-config-hash: 16c61772d9e74de7322870fd3a045598ea01f6e16be155d116423e6a246dcddc php-config-hash: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a creationTimestamp: null labels: app.kubernetes.io/component: app app.kubernetes.io/instance: nextcloud-web-app app.kubernetes.io/name: nextcloud spec: containers: - env: - name: POSTGRES_HOST value: nextcloud-web-app-postgresql - name: POSTGRES_DB value: nextcloud - name: POSTGRES_USER valueFrom: secretKeyRef: key: username name: nextcloud-pgsql-credentials - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: key: nextcloudPassword name: nextcloud-pgsql-credentials - name: NEXTCLOUD_ADMIN_USER valueFrom: secretKeyRef: key: username name: nextcloud-admin-credentials - name: NEXTCLOUD_ADMIN_PASSWORD valueFrom: secretKeyRef: key: password name: nextcloud-admin-credentials - name: NEXTCLOUD_TRUSTED_DOMAINS value: nextcloud.vleermuis.tech - name: NEXTCLOUD_DATA_DIR value: /var/www/html/data image: nextcloud:25.0.3-fpm imagePullPolicy: Always name: nextcloud resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/www/ name: nextcloud-main subPath: root - mountPath: /var/www/html name: nextcloud-main subPath: html - mountPath: /var/www/html/data name: nextcloud-main subPath: data - mountPath: /var/www/html/config name: nextcloud-main subPath: config - mountPath: /var/www/html/custom_apps name: nextcloud-main subPath: custom_apps - mountPath: /var/www/tmp name: nextcloud-main subPath: tmp - mountPath: /var/www/html/themes name: nextcloud-main subPath: themes - mountPath: /var/www/html/config/logging.config.php name: nextcloud-config subPath: logging.config.php - mountPath: /var/www/html/config/proxy.config.php name: nextcloud-config subPath: proxy.config.php - mountPath: /var/www/html/config/.htaccess name: nextcloud-config subPath: .htaccess - mountPath: /var/www/html/config/apache-pretty-urls.config.php name: nextcloud-config subPath: apache-pretty-urls.config.php - mountPath: /var/www/html/config/apcu.config.php name: nextcloud-config subPath: apcu.config.php - mountPath: /var/www/html/config/apps.config.php name: nextcloud-config subPath: apps.config.php - mountPath: /var/www/html/config/autoconfig.php name: nextcloud-config subPath: autoconfig.php - mountPath: /var/www/html/config/redis.config.php name: nextcloud-config subPath: redis.config.php - mountPath: /var/www/html/config/smtp.config.php name: nextcloud-config subPath: smtp.config.php - image: nginx:alpine imagePullPolicy: Always livenessProbe: failureThreshold: 3 httpGet: httpHeaders: - name: Host value: nextcloud.vleermuis.tech path: /status.php port: http scheme: HTTP initialDelaySeconds: 45 periodSeconds: 15 successThreshold: 1 timeoutSeconds: 5 name: nextcloud-nginx ports: - containerPort: 80 name: http protocol: TCP readinessProbe: failureThreshold: 3 httpGet: httpHeaders: - name: Host value: nextcloud.vleermuis.tech path: /status.php port: http scheme: HTTP initialDelaySeconds: 45 periodSeconds: 15 successThreshold: 1 timeoutSeconds: 5 resources: {} securityContext: readOnlyRootFilesystem: false runAsGroup: 33 runAsNonRoot: true runAsUser: 82 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/www/ name: nextcloud-main subPath: root - mountPath: /var/www/html name: nextcloud-main subPath: html - mountPath: /var/www/html/data name: nextcloud-main subPath: data - mountPath: /var/www/html/config name: nextcloud-main subPath: config - mountPath: /var/www/html/custom_apps name: nextcloud-main subPath: custom_apps - mountPath: /var/www/tmp name: nextcloud-main subPath: tmp - mountPath: /var/www/html/themes name: nextcloud-main subPath: themes - mountPath: /etc/nginx/nginx.conf name: nextcloud-nginx-config subPath: nginx.conf dnsPolicy: ClusterFirst initContainers: - command: - sh - -c - until pg_isready -h nextcloud-web-app-postgresql -U ${POSTGRES_USER} ; do sleep 2 ; done env: - name: POSTGRES_USER valueFrom: secretKeyRef: key: username name: nextcloud-pgsql-credentials image: bitnami/postgresql:14.4.0-debian-11-r23 imagePullPolicy: IfNotPresent name: postgresql-isready resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 82 runAsGroup: 33 runAsNonRoot: true runAsUser: 33 serviceAccount: nextcloud-serviceaccount serviceAccountName: nextcloud-serviceaccount terminationGracePeriodSeconds: 30 volumes: - name: nextcloud-main persistentVolumeClaim: claimName: nextcloud-files - configMap: defaultMode: 420 name: nextcloud-web-app-config name: nextcloud-config - configMap: defaultMode: 420 name: nextcloud-web-app-nginxconfig name: nextcloud-nginx-config status: conditions: - lastTransitionTime: "2023-01-23T14:43:58Z" lastUpdateTime: "2023-01-23T14:43:58Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2023-01-23T14:43:58Z" lastUpdateTime: "2023-01-23T14:43:58Z" message: ReplicaSet "nextcloud-web-app-66fc5dfcb7" is progressing. reason: ReplicaSetUpdated status: "True" type: Progressing observedGeneration: 52 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1 ```

Where does the UID 82 come from? (Edit: it comes from the alpine nextcloud and nginx images - that's www-data)

I set that to 33 (nextcloud's www-data user) to test, but it didn't seem to make a difference. Just so it's clear, without editing any of the security contexts, everything works, but I would like the security context to work, because otherwise it causes my restores from backups to fail, because the /var/www/html/config directory is always created with root ownership, which means if the restores run as www-data, they can't restore that particular directory, which is important. I'm hoping the security context fixes that, so that nothing has to run as root in this stack.

I'm deploying the 3.4.1 nextcloud helm chart via Argo CD onto k3s on Ubuntu 22.04. Update: problem still present in 3.5.7 helm chart.

FrankelJb commented 1 year ago

Adding my experience: Just moved my directory from a hostpath to nfs then started encountering permission issues. I chown -R 33:33 the whole directory and set the security context. This is my error now:

4 Configuring Redis as session handler
3 /entrypoint.sh: 78: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
2 Initializing nextcloud 25.0.3.2 ...
1 touch: cannot touch '/var/www/html/nextcloud-init-sync.lock': Permission denied
jessebot commented 1 year ago

@FrankelJb are you also using nginx? Which security contexts are you setting? There's a few that you can set. If we could get the security context settings from your values.yaml, that would help in comparing states. Thank you for sharing!

FrankelJb commented 1 year ago

@jessebot I'm not using Nginx. I'm almost ready to give up on NC in kubernetes (I can't upgrade now). I've managed to solve this issue. I was trying to use a single redis cluster for all my services. However, I had to give up on that dream because NC refused to connect without a password. I'm not sure if that's caused by a config in the helm chart or my configuration error. Thanks for being so responsive :)

jessebot commented 1 year ago

I'm sorry you're having a bad time with this. I also had a bad time with this at first and then became sort of obsessed with trying to fix it for others too πŸ˜…

If you can post your values.yaml (after removing sensitive info) I can help troubleshoot it for you :)

Jeroen0494 commented 1 year ago

UID 82 comes from the Nextcloud fpm alpine image. If you use another image instead of alpine, I believe the user is 33. The NGINX container you use is an alpine based image, so you have to make sure the group and fsgroup match for both containers.

For example my (abbreviated) deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextcloud
  namespace: nextcloud
  labels:
    app: nextcloud
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nextcloud
  template:
    metadata:
      annotations:
        container.apparmor.security.beta.kubernetes.io/nextcloud: localhost/container-nextcloud
        container.apparmor.security.beta.kubernetes.io/nginx: localhost/container-nginx
      labels:
        app: nextcloud
    spec:
      automountServiceAccountToken: false

      containers:
      - name: nextcloud
        image: "nextcloud:24.0.9-fpm-alpine"

        securityContext:
          runAsUser: 82
          allowPrivilegeEscalation: false
          privileged: false
          runAsNonRoot: true
          capabilities:
            drop:
              - ALL
          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

      - name: nginx
        image: cgr.dev/chainguard/nginx:1.23.3
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          capabilities:
            add:
              - NET_BIND_SERVICE
            drop:
              - ALL
          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nginx-seccomp-profile.json

      # Will mount configuration files as www-data (id: 82) for nextcloud
      securityContext:
        fsGroup: 82
      serviceAccountName: nextcloud-serviceaccount

You can see I use a distroless NGINX container image, bu the principle is the same.

FrankelJb commented 1 year ago

@jessebot here is a link to my values.yaml. I've just tried to recreate with flux, moving from agocd, and it just waits on "Initializing nextcloud 25.0.4.1 ..." for minutes. It was working with the same yaml, the deployment took 45 minutes last time.

jessebot commented 1 year ago

@FrankelJb , for Argo CD, I detailed some of my trials in https://github.com/nextcloud/helm/issues/336#issuecomment-1509829893 if that's at all helpful.

For this issue owned by root issue, also discussed in #114 , I finally got around to testing it (after battling argo πŸ˜… ), and I've noted that all of the securityContext parameters I've tested (nextcloud, nginx, and the nextcloud pod) seem to work kind of, but the following directories are always owned by root on the nextcloud container:

Screenshot showing root ownership and GID 82 group ownership on everything in /var/www/html/config on the nextcloud pod except config.php which is owned by www-data Screenshot showing everything in /var/www/html owned by www-data and group ownership on all of it set to GID 82, EXCEPT for config, custom_apps, data, and themes

I don't know why though. At first, I thought it was a persistence thing, but then I disabled persistence and it's still an issue. You can kind of see me live testing with 3.5.7 nextcloud chart here, but each thing I test leads me further to believing there's something going on with our volume mounts? I've been using the 26.0.0-fpm image, but I haven't tested the regular image or the alpine image like @Jeroen0494 suggested, yet.

Note: This /var/www/html/config directory owned by root doesn't happen when using the nextcloud docker container directly and setting it to run as nonroot. This only happens with the helm chart.

@provokateurin or @tvories have you been able to get this to work? I can get every other directory to be created as any other user, but the directories from the screenshot seem to always be owned by root. You can see my values.yaml here, but I don't know what else we need to set here πŸ€” Are there security contexts for persistent volumes? Or perhaps mount options we need to set for the configmap when it gets mounted? It's been months, albeit in my off hours, but I'm still so confused.

jessebot commented 1 year ago

@Jeroen0494 , I switched to the 26.0.0-fpm-alpine tag and also added most of the options you'd added and /var/www/html/config is still owned by root when deploying with this helm chart. You can see the full values.yaml I tried here, but the important parts are this:

image:
  repository: nextcloud
  flavor: fpm-alpine
  pullPolicy: Always

nextcloud:
  # Set securityContext parameters. For example, you may need to define runAsNonRoot directive
  securityContext:
    runAsUser: 82
    runAsGroup: 82
    runAsNonRoot: true
    readOnlyRootFilesystem: false
    allowPrivilegeEscalation: false
    privileged: false
    capabilities:
      drop:
        - ALL

  podSecurityContext:
    fsGroup: 82
...
# this is deprecated, but I figured why not, anything to change that one config directory from root (but it didn't work)
securityContext:
  fsGroup: 82

I can't figure out what else it would be. Maybe a script in the container itself? πŸ€” Are you using the helm chart and using persistence? Is your /var/www/html/config owned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:

          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

I see it described here in the k8s api docs, but it doesn't link further for what goes in localhostProfile.

tomasodehnal commented 1 year ago

@jessebot Not sure if it is the same issue but maybe it will help. I'm using 25-alpine with hostPath PV and even though I have set securityContext in the pod and used the same id for the ownership of the path on the host, the mapped subdirectories of the PV were owned by root:root and container was stuck on:

/entrypoint.sh: 104: cannot create /var/www/html/nextcloud-init-sync.lock: Permission denied

I resolved it by manually changing the ownership of the subdirs on the host to the same uid.

jessebot commented 1 year ago

@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue 😁 ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?

The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.

Here's one of my PVCs which is using the local path provisioner, since I'm using k3s:

# Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: nextcloud
  name: nextcloud-files
  annotations:
    k8up.io/backup: "true"
    volumeType: local
spec:
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Still looking if there's anything that can be done here, but from my research, this might just be something that needs to be solved in an init container, which I might have to make a PR for :(

Update: tested without any sort of values.yaml, using all default settings on k3s with chart version 3.5.8 and only nextcloud-init-sync.lock is owned by root like this:

-rw-r--r--  1 root     www-data    0 Apr 16 09:01 nextcloud-init-sync.lock

but that's without any persistence or configurations enabled πŸ€”


Re: nextcloud-init-sync.lock

That file is actually owned by root by default in all the nextcloud docker containers, but only that one file (it occurs in both the docker container directly and in the helm chart).

Example Default Permissions on nextcloud:fpm-alpine Docker Container ```bash $ docker run -d nextcloud:fpm-alpine Unable to find image 'nextcloud:fpm-alpine' locally fpm-alpine: Pulling from library/nextcloud f56be85fc22e: Pull complete ace8de9a4ff5: Pull complete ac818333da4c: Pull complete 10f4138fad9a: Pull complete 04049f99cb8d: Pull complete 93231f0bdcb6: Pull complete ab266ad8891c: Pull complete 552295b4d6d8: Pull complete cffafb46943d: Pull complete 4964abd498c6: Pull complete a05442d246e3: Pull complete 42633b5b39c2: Pull complete 6f8014cbce5e: Pull complete 18729ba22f88: Pull complete 9eedd0061e2b: Pull complete 97d1b1593a77: Pull complete Digest: sha256:9a08c42558cda7d48de2cc3da9f5150eeed81e7595aa4c2c5ace6612c3923240 Status: Downloaded newer image for nextcloud:fpm-alpine 688a243c0388ca26541b0d39cc5ebe3c83ad41df617aa601e28e08a258319dfa $ docker exec -it frosty_mendel /bin/sh /var/www/html # ls -hal total 180K drwxrwxrwt 15 www-data www-data 4.0K Apr 16 08:42 . drwxrwxr-x 1 www-data root 4.0K Apr 14 20:46 .. -rw-r--r-- 1 www-data www-data 3.2K Apr 16 08:42 .htaccess -rw-r--r-- 1 www-data www-data 101 Apr 16 08:42 .user.ini drwxr-xr-x 45 www-data www-data 4.0K Apr 16 08:42 3rdparty -rw-r--r-- 1 www-data www-data 18.9K Apr 16 08:42 AUTHORS -rw-r--r-- 1 www-data www-data 33.7K Apr 16 08:42 COPYING drwxr-xr-x 50 www-data www-data 4.0K Apr 16 08:42 apps drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 config -rw-r--r-- 1 www-data www-data 4.0K Apr 16 08:42 console.php drwxr-xr-x 24 www-data www-data 4.0K Apr 16 08:42 core -rw-r--r-- 1 www-data www-data 6.2K Apr 16 08:42 cron.php drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 custom_apps drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 data drwxr-xr-x 2 www-data www-data 12.0K Apr 16 08:42 dist -rw-r--r-- 1 www-data www-data 156 Apr 16 08:42 index.html -rw-r--r-- 1 www-data www-data 3.4K Apr 16 08:42 index.php drwxr-xr-x 6 www-data www-data 4.0K Apr 16 08:42 lib -rw-r--r-- 1 root root 0 Apr 16 08:42 nextcloud-init-sync.lock -rwxr-xr-x 1 www-data www-data 283 Apr 16 08:42 occ drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocm-provider drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocs drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocs-provider -rw-r--r-- 1 www-data www-data 3.1K Apr 16 08:42 public.php -rw-r--r-- 1 www-data www-data 5.4K Apr 16 08:42 remote.php drwxr-xr-x 4 www-data www-data 4.0K Apr 16 08:42 resources -rw-r--r-- 1 www-data www-data 26 Apr 16 08:42 robots.txt -rw-r--r-- 1 www-data www-data 2.4K Apr 16 08:42 status.php drwxr-xr-x 3 www-data www-data 4.0K Apr 16 08:42 themes -rw-r--r-- 1 www-data www-data 384 Apr 16 08:42 version.php ```

Running docker with --user 82:82 fixes the issue on the alpine image (you'd use 33 for the non-alpine images) as you can see here (but that's not helpful for k8s itself 😞 since this was using docker directly):

Example Fixed Permissions on nextcloud:fpm-alpine Docker Container ```bash $ docker run -d --user 82:82 nextcloud:fpm-alpine 9761e3ff869b3ad026ef5bf10b333d5c52c2ec0ad6b5dd212016d083c8888dd3 $ docker exec -it eager_buck /bin/sh /var/www/html $ ls -hal total 180K drwxrwxrwt 15 www-data root 4.0K Apr 16 08:48 . drwxrwxr-x 1 www-data root 4.0K Apr 14 20:46 .. -rw-r--r-- 1 www-data www-data 3.2K Apr 16 08:48 .htaccess -rw-r--r-- 1 www-data www-data 101 Apr 16 08:48 .user.ini drwxr-xr-x 45 www-data www-data 4.0K Apr 16 08:48 3rdparty -rw-r--r-- 1 www-data www-data 18.9K Apr 16 08:48 AUTHORS -rw-r--r-- 1 www-data www-data 33.7K Apr 16 08:48 COPYING drwxr-xr-x 50 www-data www-data 4.0K Apr 16 08:48 apps drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 config -rw-r--r-- 1 www-data www-data 4.0K Apr 16 08:48 console.php drwxr-xr-x 24 www-data www-data 4.0K Apr 16 08:48 core -rw-r--r-- 1 www-data www-data 6.2K Apr 16 08:48 cron.php drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 custom_apps drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 data drwxr-xr-x 2 www-data www-data 12.0K Apr 16 08:48 dist -rw-r--r-- 1 www-data www-data 156 Apr 16 08:48 index.html -rw-r--r-- 1 www-data www-data 3.4K Apr 16 08:48 index.php drwxr-xr-x 6 www-data www-data 4.0K Apr 16 08:48 lib -rw-r--r-- 1 www-data www-data 0 Apr 16 08:48 nextcloud-init-sync.lock -rwxr-xr-x 1 www-data www-data 283 Apr 16 08:48 occ drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocm-provider drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocs drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocs-provider -rw-r--r-- 1 www-data www-data 3.1K Apr 16 08:48 public.php -rw-r--r-- 1 www-data www-data 5.4K Apr 16 08:48 remote.php drwxr-xr-x 4 www-data www-data 4.0K Apr 16 08:48 resources -rw-r--r-- 1 www-data www-data 26 Apr 16 08:48 robots.txt -rw-r--r-- 1 www-data www-data 2.4K Apr 16 08:48 status.php drwxr-xr-x 3 www-data www-data 4.0K Apr 16 08:48 themes -rw-r--r-- 1 www-data www-data 384 Apr 16 08:48 version.php ```
Jeroen0494 commented 1 year ago

@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?

When using existing storage and the owner of the files is root, when switching to a non-root container it wouldn't be able to change the owner. You'd have to change the owner on the storage medium itself with a chown.

Does the issue exist when using no attached storage? And when using emptyDir? An when using PVC template with local-path-provisioner?

I can't figure out what else it would be. Maybe a script in the container itself? thinking Are you using the helm chart and using persistence? Is your /var/www/html/config owned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:

          seccompProfile:
            type: Localhost
            localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json

I see it described here in the k8s api docs, but it doesn't link further for what goes in localhostProfile.

I'm using the security profiles operator and have written my own seccomp profile. You may ignore this line, or switch type to RuntimeDefault.

Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.

jessebot commented 1 year ago

Thanks for getting back to me, @Jeroen0494 πŸ™

Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.

Commented on that PR and will take another look after conflicts are resolved :) Will still probably ping Kate in though, as the PR is large.

@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?

Let me try with emptyDir actually. πŸ€” I've been doing this on a fresh k3s cluster each time. I completely destroy the cluster and it's storage before testing a new cluster. I checked /var/lib/rancher after removing k3s and there isn't anything in that directory, though the directory is owned by root, however the directories within it should not be. I use smol-k8s-lab for deploying and destroying local k3s clusters. Let me spin up a new cluster and check the ownership of the directory after that.

Does the issue exist when using no attached storage?

No, the issue doesn't exist when I don't use any persistence. Well, except for the nextcloud-init-sync.lock file, which is always owned by root, but that's not what I'm after right now. I'm after the /var/www/html/config dir. Detailed more info on that lock file here: https://github.com/nextcloud/helm/issues/335#issuecomment-1510203221

Jeroen0494 commented 1 year ago

Could you also try with a local mount, instead of using the local path provisioner?

For example, my PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud-data
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nextcloud-data
    namespace: nextcloud
  local:
    path: /data/crypt/nextcloud/data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - mediaserver.fritz.box
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
jessebot commented 1 year ago

Here's what else I tried recently:

I do not know how to set an emptyDir with the current values.yaml πŸ€”

Creating a Persistent Volume with spec.hostPath.path

I was previously using a dynamic pvc, but here's the new setup I tried, using the 26.0.0-fpm tag again this time, so I only did changed the securityContext for the nextcloud container, since nginx isn't even what I'm troubleshooting, so I didn't set nextcloud.podSecurityContext. Here's the PV and existing PVC for nextcloud files:

PV and PVC yaml ```yaml --- kind: PersistentVolume apiVersion: v1 metadata: namespace: nextcloud name: nextcloud spec: storageClassName: local-path capacity: storage: 11Gi accessModes: - ReadWriteOnce hostPath: path: '/data/nextcloud' --- # Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist apiVersion: v1 kind: PersistentVolumeClaim metadata: namespace: nextcloud name: nextcloud-files annotations: k8up.io/backup: "true" spec: volumeName: nextcloud storageClassName: local-path accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ```

The above still failed, so I'm beginning to think this is k3s related... because the directory I specified, I also created as user 33:33, which is also www-data on the host machine.

Screenshot 2023-04-16 at 15 50 34

I found this k3s issue, #3704, and whatever the fix was, just didn't seem to work? There's nother PR opened here, #7217, which may fix it but 🀷

Creating a Persistent Volume with spec.local.path

Next I tried the second thing you suggested, @Jeroen0494 , with a PV that has spec.local.path like this, making sure also that /data/nextcloud was cleaned between runs and was owned by www-data which is UID 33 in both the securityContext for the nextcloud container and the host node:

PV and PVC yaml ```yaml --- # using local path instead of local-path provisioner directly apiVersion: v1 kind: PersistentVolume metadata: namespace: nextcloud name: nextcloud spec: accessModes: - ReadWriteOnce capacity: storage: 10Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: nextcloud-files namespace: nextcloud local: path: /data/nextcloud/ nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - compufam --- # persistent volume claim for nexctcloud data (/var/www/html) to persist apiVersion: v1 kind: PersistentVolumeClaim metadata: namespace: nextcloud name: nextcloud-files annotations: k8up.io/backup: "true" spec: volumeName: nextcloud # tried with AND *without* storageClassName set int he pvc storageClassName: local-path accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ```

This also fails, and what's weird is that I'm not using the alpine container for nextcloud, but it still changed the group ownership to UID 82 but also left the user as root for all the same directories as previously 🀷 :

Screenshot 2023-04-16 at 16 29 06

Edit: Just realized I left spec.storageClassName: local-path in the persistent volume claim, so tried again without it and same result with the UID 82 above before the edit. I think we need to fix that, because that's coming from the deployment.yaml here, where it says to always set the fsGroup for the nextcloud container to 82 if nginx is enabled, but using the nginx-alpine container, doesn't mean that a user is using an alpine nextcloud container, so setting the fsgroup to 82 here doesn't make sense: https://github.com/nextcloud/helm/blob/3ad31c7461c4c3b58e0662ff6b4bdd1754dff7f2/charts/nextcloud/templates/deployment.yaml#L332-L344 Submitted PR here: https://github.com/nextcloud/helm/pull/379 (but that only would fix the group ownership, not the user ownership)

Current thoughts...

Perhaps since bitnami's postgres chart also provides an init container to get around this, we should just provide that as well since k3s and rancher are pretty popular, and it's not pretty, but I don't really see a way around this so far? (there is a beta rootless mode for k3s, but I haven't dove into that yet)

tomasodehnal commented 1 year ago

@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue grin ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?

The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.

@jessebot It's K3s on a Ubuntu VM on ESXi.

This is the manifest I use for the persistence.nextcloud volume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud
  labels:
    type: local
spec:
  storageClassName: nextcloud
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/nextcloud"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nextcloud
  namespace: nextcloud
spec:
  storageClassName: nextcloud
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

And the respective excerpt from the values.yaml:

nextcloud:
  podSecurityContext:
    runAsUser: 1003
    runAsGroup: 1003
    runAsNonRoot: true
    fsGroup: 1003
persistence:
  enabled: true
  existingClaim: nextcloud

I was testing with fresh install without existing claims and I would say it works as expected:

Looking into your manifest there is one thing I noticed. You say you use local-path provider, but I believe that might not be the case. The reason is that you are creating the PV on your own and the name local-path is then used as a referrer to existing PV by the claim, it is not the actual storage class. You can easily find out in annotations of the PVC - can you see volume.kubernetes.io/storage-provisioner: rancher.io/local-path? The PV should be created by the provider, so you might remove the definition from the manifest and keep only PVC.

I think the issue lies in the storage provider one uses and is not related to nextcloud:

If you want resolve it regardless of storage used, I would say init container is the safe bet, but it will need to have privileged permissions.

One other observation - fsGroup was not respected and used in my test as I'm on 1.25.3. Looks it might be supported only since 1.25.4 https://github.com/k3s-io/k3s/issues/6401.

jessebot commented 1 year ago

Popping very quickly to say I tested this on GKE with kubernetes.io/gce-pd provisioner, and the same issue happens :( :

root@nextcloud-web-app-68f6bb8fb6-nblkq:/var/www/html# ls -hal
total 196K
drwxrwsr-x 15 www-data www-data 4.0K Apr 23 14:52 .
drwxrwsr-x  4 root           82 4.0K Apr 23 14:52 ..
-rw-r--r--  1 www-data www-data 3.2K Apr 23 14:52 .htaccess
-rw-r--r--  1 www-data www-data  101 Apr 23 14:52 .user.ini
drwxr-sr-x 45 www-data www-data 4.0K Apr 23 14:52 3rdparty
-rw-r--r--  1 www-data www-data  19K Apr 23 14:52 AUTHORS
-rw-r--r--  1 www-data www-data  34K Apr 23 14:52 COPYING
drwxr-sr-x 50 www-data www-data 4.0K Apr 23 14:52 apps
drwxrwsr-x  2 root           82 4.0K Apr 23 14:52 config
-rw-r--r--  1 www-data www-data 4.0K Apr 23 14:52 console.php
drwxr-sr-x 24 www-data www-data 4.0K Apr 23 14:52 core
-rw-r--r--  1 www-data www-data 6.2K Apr 23 14:52 cron.php
drwxrwsr-x  2 www-data www-data 4.0K Apr 23 14:52 custom_apps
drwxrwsr-x  2 www-data www-data 4.0K Apr 23 14:52 data
drwxr-sr-x  2 www-data www-data  12K Apr 23 14:52 dist
-rw-r--r--  1 www-data www-data  156 Apr 23 14:52 index.html
-rw-r--r--  1 www-data www-data 3.4K Apr 23 14:52 index.php
drwxr-sr-x  6 www-data www-data 4.0K Apr 23 14:52 lib
-rw-r--r--  1 root           82    0 Apr 23 14:52 nextcloud-init-sync.lock
-rw-r-----  1 www-data www-data  14K Apr 23 14:54 nextcloud.log
-rwxr-xr-x  1 www-data www-data  283 Apr 23 14:52 occ
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocm-provider
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocs
drwxr-sr-x  2 www-data www-data 4.0K Apr 23 14:52 ocs-provider
-rw-r--r--  1 www-data www-data 3.1K Apr 23 14:52 public.php
-rw-r--r--  1 www-data www-data 5.5K Apr 23 14:52 remote.php
drwxr-sr-x  4 www-data www-data 4.0K Apr 23 14:52 resources
-rw-r--r--  1 www-data www-data   26 Apr 23 14:52 robots.txt
-rw-r--r--  1 www-data www-data 2.4K Apr 23 14:52 status.php
drwxrwsr-x  3 www-data www-data 4.0K Apr 23 14:52 themes
-rw-r--r--  1 www-data www-data  384 Apr 23 14:52 version.php

I don't think this is specific to k3s anymore πŸ€”

oliverhu commented 10 months ago

@jessebot I think I ran into the same issue https://github.com/nextcloud/helm/issues/504 and I saw your perseverance tackling this... config folder is owned by root:root and thus the folder is empty. Were you able to find a fix for this issue?

MrFishFinger commented 2 months ago

i also stumbled onto this situation, where mounting a rancher.io/local-path PVC into k3s, results in the directory being owned as root. setting securityContext.fsGroup does change the directory group - just not the owner

i also observed the same behaviour with the kubernetes.io/aws-ebs provisioner on EKS. I am not sure if this is actually a bug, or if this is just working as expected? at least from these discussions, it seems like this is known behaviour:

...

anyway, at least for my usecase, i was able to get a non-root nextcloud container running by setting the php config check_data_directory_permissions. i also got non-root nginx running by using image nginxinc/nginx-unprivileged:alpine.

below is a partial extract from my values.caml file. maybe this can help someone in the same boat?

image:
  flavor: fpm

persistence:
  enabled: true
  existingClaim: nextcloud-pvc

nextcloud:
  ...
  podSecurityContext:
   runAsUser: 33
   runAsGroup: 33
   runAsNonRoot: true
   readOnlyRootFilesystem: false
  configs:
    custom.config.php: |
      <?php
        $CONFIG = array(
          'check_data_directory_permissions' => false, # https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/
        );

nginx:
  enabled: true
  image:
    repository: nginxinc/nginx-unprivileged
    tag: alpine
    pullPolicy: IfNotPresent
  securityContext:
    runAsUser: 101
    runAsGroup: 101
    runAsNonRoot: true
    readOnlyRootFilesystem: false
...

PVC definition:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nextcloud-pvc
  namespace: nextcloud
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Mi
  storageClassName: local-path
  volumeMode: Filesystem
QuinnBast commented 3 weeks ago

I am also having this issue and it's a pretty big showstopper for me. I'm using a RWX persistent volume that needs to have a group set in one pod and a different group set in nextcloud. But nextcloud REQUIRES all of the files in the container to be owned by 33 group in order for nextcloud to be able to see them.

As soon as I set the fsGroup or runAsGroup to something else, Nextcloud crashes with the SIGWINCH error that tells me absolutely no information about why Nextcloud is crashing.

Nextcloud is frustrating to say the least. I'm not sure why so many people like this thing, I've only experienced headache and problems with Nextcloud so far...

steled commented 1 week ago

Hi,

I had a similar problem and for me the fix was to create all folders that are needed by the www-data:www-data manually with the needed permissions.

I'm using pre created persistent volumes with hostpath (will migrate to local in the future)

See below the commands I'm running on my node (using /ext/persistent/nextcloud-staging/server as my root path for nextcloud):

sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server
sudo chown 1000:1000 -R /ext/persistent/nextcloud-staging/server/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/config
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/config/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/custom_apps
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/custom_apps/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/data
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/data/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/html
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/html/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/root
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/root/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/themes
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/themes/
sudo mkdir --mode 0755 -p /ext/persistent/nextcloud-staging/server/tmp
sudo chown www-data:www-data -R /ext/persistent/nextcloud-staging/server/tmp/

It might be the problem that hostPath and local are "only be used as a statically created PersistentVolume. Dynamic provisioning is not supported."

Maybe this helps someone.