nextcloud / helm

A community maintained helm chart for deploying Nextcloud on Kubernetes.
GNU Affero General Public License v3.0
337 stars 268 forks source link

Stuck at "Initializing Nextcloud..." when attached to NFS PVC #10

Open somerandow opened 4 years ago

somerandow commented 4 years ago

Doing my best to dupe helm/charts#22920 over to the new repo as I am experiencing this issue as well. I have refined the details a bit, as this issue appears to be specifically related to NFS-based storage.

Describe the bug

When bringing up the nextcloud pod via the helm chart, the logs show the pod as being stuck at:

2020-08-31T19:00:42.054297154Z Configuring Redis as session handler
2020-08-31T19:00:42.098305129Z Initializing nextcloud 19.0.1.1 ...

Even backing out the liveness/readiness probes to over 5 minutes does not give If I instead switch the PVC to my storageClass for Rancher Longhorn (iSCSI) for example, the nextcloud install initializes in seconds.

Version of Helm and Kubernetes:

helm: v3.3.0 kubernetes: v1.18.6

Which chart:

nextcloud/helm

What happened:

What you expected to happen:

Nextcloud finishes initialization Nextcloud files appear with correct permissions on NFS volume

How to reproduce it (as minimally and precisely as possible):

Set up an NFS provisioner:

helm install stable/nfs-client-provisioner nfs  \
--set nfs.server=x.x.x.x --set nfs.path=<path>

OR Configure an NFS PV and PVC manually

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud-data
  labels:
    app: cloud
    type: data
spec:
  capacity:
    storage: 100Ti
  nfs:
    path: <path>
    server: <server>
  mountOptions:
    - async
    - nfsvers=4.2
    - noatime
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-manual
  volumeMode: Filesystem
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nextcloud-data
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Ti
  storageClassName: nfs-manual
  volumeMode: Filesystem
  selector:
    matchLabels:
      app: cloud
      type: data

Install nextcloud helm install -f values.yaml nextcloud/helm nextcloud --namespace=nextcloud

values.yaml:

image:
  repository: nextcloud
  tag: 19
readinessProbe:
  initialDelaySeconds: 560
livenessProbe:
  initialDelaySeconds: 560
resources:
  requests:
    cpu: 200m
    memory: 500Mi
  limits:
    cpu: 2
    memory: 1Gi
ingress:
  enabled: true
  annotations:
    cert-manager.io/cluster-issuer: acme
    kubernetes.io/ingress.class: nginx
    # nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  hosts:
    - "cloud.myhost.com"
  tls:
    - hosts:
        - "cloud.myhost.com"
      secretName: prod-cert
  path: /
nextcloud:
  username: admin
  password: admin1
  # datadir: /mnt/data
  host: "cloud.myhost.com"
internalDatabase:
  enabled: true
externalDatabase:
  enabled: false
persistence:
  enabled: true
  # accessMode: ReadWriteMany
  # storageClass: nfs-client if creating via provisioner
  existingClaim: nextcloud-data # comment out if creating new PVC via provisioner
elisaado commented 1 year ago

@mddeff did you have to do anything special to get it work? (like changing UIDs or security context parameters)

pishangujeniya commented 10 months ago

🚀 following setup worked for me with NFS 🙌 ❤️

NFS Server Setup

  1. Ubuntu Server 22.04
  2. sudo apt install -y nfs-common
  3. sudo apt install nfs-kernel-server
  4. sudo systemctl start nfs-kernel-server.service
  5. Now need to add the following lines to the /etc/exports file
  6. /srv/nfs <IP>(rw,sync,no_subtree_check,no_root_squash)
  7. Then run the following command to update the NFS
  8. sudo exportfs -a
  9. sudo reboot
  10. sudo mkdir -p /srv/nfs/nextcloud
  11. sudo chown -R nobody:nogroup /srv/nfs/nextcloud

Kubernetes Setup

  1. Ubuntu Server 22.04
  2. sudo snap install microk8s --classic --channel=1.29/stable
  3. sudo microk8s enable rbac
  4. sudo microk8s enable dns
  5. sudo microk8s enable ingress
  6. sudo microk8s enable helm3
  7. sudo apt install -y nfs-common
  8. sudo microk8s helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
  9. sudo microk8s helm repo update
  10. sudo microk8s helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.6.0 --set kubeletDir="/var/snap/microk8s/common/var/lib/kubelet"
  11. Create Following Storage Class YAML and apply it
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: nfs-csi
    provisioner: nfs.csi.k8s.io
    parameters:
    server: YOUR_NFS_IP_OR_DOMAIN
    share: /srv/nfs
    reclaimPolicy: Retain
    volumeBindingMode: Immediate
    allowVolumeExpansion: true
    mountOptions:
    - hard
    - nfsvers=4.1
  12. kubectl apply -f ./nfs.storageclass.yaml

Creating PV & PVC

PersistentVolume Sample Yaml


apiVersion: v1
kind: PersistentVolume
metadata:
  name: nextcloud
spec:
  capacity:
    storage: 200Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-csi
  mountOptions:
    - hard
    - nfsvers=4.1
  csi:
    driver: nfs.csi.k8s.io
    readOnly: false
    volumeHandle: NFS_IP_OR_DOMAIN#srv/nfs#nextcloud##
    volumeAttributes:
      server: NFS_IP_OR_DOMAIN
      share: srv/nfs
      subdir: nextcloud

PersistentVolumeClaim Sample Yaml


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: YOUR_NAMESPACE
  name: nextcloud-pvc
spec:
  storageClassName: nfs-csi
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 200Gi
  volumeName: nextcloud
  1. Apply both the persistence volumeand persistence volume claim

Following is values.yaml for helm chart of nextcloud

# Number of replicas to be deployed
  replicaCount: 1

  ## Allowing use of ingress controllers
  ## ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
  ##
  ingress:
    enabled: true
    className: public
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-production"
      kubernetes.io/ingress.class: public
      nginx.ingress.kubernetes.io/proxy-body-size: "5000000m"
      # kubernetes.io/tls-acme: "true"
      # # Keep this in sync with the README.md:
      nginx.ingress.kubernetes.io/server-snippet: |-
        server_tokens off;
        proxy_hide_header X-Powered-By;
        rewrite ^/.well-known/webfinger /index.php/.well-known/webfinger last;
        rewrite ^/.well-known/nodeinfo /index.php/.well-known/nodeinfo last;
        rewrite ^/.well-known/host-meta /public.php?service=host-meta last;
        rewrite ^/.well-known/host-meta.json /public.php?service=host-meta-json;
        location = /.well-known/carddav {
          return 301 $scheme://$host/remote.php/dav;
        }
        location = /.well-known/caldav {
          return 301 $scheme://$host/remote.php/dav;
        }
        location = /robots.txt {
          allow all;
          log_not_found off;
          access_log off;
        }
        location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
          deny all;
        }
        location ~ ^/(?:autotest|occ|issue|indie|db_|console) {
          deny all;
        }
    tls:
      - secretName: nextcloud-tls
        hosts:
          - nextcloud.domain.com
    hosts:
      - host: nextcloud.domain.com

  nextcloud:
    update: 0
    containerPort: 80
    host: nextcloud.domain.com
    username: admin
    password: YOUR_PASSWORD
    mail:
      enabled: true
      fromAddress: YOUR_GMAIL_ID
      domain: smtp.gmail.com
      smtp:
        host: smtp.gmail.com
        secure: ssl
        port: 465
        authtype: LOGIN
        name: YOUR_GMAIL_ID
        password: "YOUR_GMAIL_APP_PASSWORD"

  podSecurityContext:
    runAsUser: 33
    runAsGroup: 33
    runAsNonRoot: true
    readOnlyRootFilesystem: false

  # internalDatabase:
  #   enabled: true
  #   name: nextcloud

  persistence:
    enabled: true
    existingClaim: nextcloud-pvc
    accessMode: ReadWriteOnce
    size: 200Gi
  resources:
    {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi

  livenessProbe:
    enabled: true
    initialDelaySeconds: 100
    periodSeconds: 100
    timeoutSeconds: 10
    failureThreshold: 5
    successThreshold: 1
  readinessProbe:
    enabled: true
    initialDelaySeconds: 100
    periodSeconds: 100
    timeoutSeconds: 10
    failureThreshold: 5
    successThreshold: 1
  startupProbe:
    enabled: true
    initialDelaySeconds: 600
    periodSeconds: 100
    timeoutSeconds: 10
    failureThreshold: 5
    successThreshold: 1
  1. helm upgrade --install nextcloudserver --namespace "nextcloudserver" --create-namespace nextcloud/nextcloud -f ./values.yaml
  2. Now wait for pod to initialize and wait for at least 10 minutes, it will work.
simon-b64 commented 9 months ago

Thanks this works but it's honestly a workaround. I would like to see the root of the issue fixed. ^^ I now also did this and I had to wasit an hour for the installation to be done

Funny thing is that when using an iscsi connection it work perfectly fine and fast.

MohammedNoureldin commented 9 months ago

Did anyone find any reasonable and consistent solution?

This is kind of a blocker for anyone wants HA.

simon-b64 commented 9 months ago

I mean you can disable the probes, wait for initialisation and then enable them again.

It worked for me but I had poor performance

MohammedNoureldin commented 9 months ago

@simon-b64 thank you for relying.

Poor performance in the whole Nextcloud, or in specific use-cases?

Actually Even after getting it booted, I get the issue:

It looks like you are trying to reinstall your Nextcloud. However the file CAN_INSTALL is missing from your config directory. Please create the file CAN_INSTALL in your config folder to continue.

Even if I made the livenessProb wait for 120 in the beginning. Does it take longer and thus it resets my pod in the middle on initialization? I am going to try disabling it now. It is cumbersome to test it, because every time I need to wait for about 30 minutes, which is really annoying.

Is the root cause still a mystery?

Disabling and then reenabling is kind of pain for automation, while the main issue is still unknown.

simon-b64 commented 9 months ago

@MohammedNoureldin I especially noticed the poor performance in the webui and sync speeds. It took twice the time to upload things.

I didnt even manage to get it running without disabling the probes. :)

MohammedNoureldin commented 6 months ago

Nextcloud 29, with Longhorn and NFS, PVC RWX takes about 30 ,minutes to initialize.

Did anyone manage to get it running with multiple replicas (so basically with RWX) by fine tuning some magic variables?

asosnovsky-sumologic commented 5 months ago

@MohammedNoureldin NFS is really annoying with Nextcloud, you basically need to have some sidecar or some process that continuously makes sure that the ownership (i.e. chown -R 33:33 /var/www/html of the files belongs to Nextcloud, or try to tweak your NFS host to map the permissions to the nextcloud user.

tcoupin commented 2 months ago

I'm using longhorn too with RWX PVC. I've managed to speed up initialization/upgrade by sharing only folders who need to be writable by several pods using symlink:

drwxrwxrwt 10 www-data www-data    4096 Sep 24 11:25 .
drwxrwxr-x  1 www-data root        4096 Sep 24 11:30 ..
-rw-r--r--  1 www-data www-data    3954 Sep 24 11:25 .htaccess
-rw-r--r--  1 www-data www-data     101 Sep 24 11:25 .user.ini
drwxr-xr-x 44 www-data www-data    4096 Sep 24 11:25 3rdparty
-rw-r--r--  1 www-data www-data   23796 Sep 24 11:25 AUTHORS
-rw-r--r--  1 www-data www-data   34520 Sep 24 11:25 COPYING
drwxr-xr-x 51 www-data www-data    4096 Sep 24 11:25 apps
-rw-r--r--  1 www-data www-data    1283 Sep 24 11:25 composer.json
-rw-r--r--  1 www-data www-data    3140 Sep 24 11:25 composer.lock
lrwxrwxrwx  1 www-data www-data      27 Sep 24 11:25 config -> /var/www/html-shared/config
-rw-r--r--  1 www-data www-data    4095 Sep 24 11:25 console.php
drwxr-xr-x 24 www-data www-data    4096 Sep 24 11:25 core
-rw-r--r--  1 www-data www-data    7061 Sep 24 11:25 cron.php
lrwxrwxrwx  1 www-data www-data      32 Sep 24 11:25 custom_apps -> /var/www/html-shared/custom_apps
lrwxrwxrwx  1 www-data www-data      25 Sep 24 11:25 data -> /var/www/html-shared/data
drwxr-xr-x  2 www-data www-data   12288 Sep 24 11:25 dist
-rw-r--r--  1 www-data www-data     156 Sep 24 11:25 index.html
-rw-r--r--  1 www-data www-data    4103 Sep 24 11:25 index.php
drwxr-xr-x  6 www-data www-data    4096 Sep 24 11:25 lib
lrwxrwxrwx  1 www-data www-data      25 Sep 24 11:25 logs -> /var/www/html-shared/logs
lrwxrwxrwx  1 www-data www-data      45 Sep 24 11:25 nextcloud-init-sync.lock -> /var/www/html-shared/nextcloud-init-sync.lock
-rwxr-xr-x  1 www-data www-data     283 Sep 24 11:25 occ
drwxr-xr-x  2 www-data www-data    4096 Sep 24 11:25 ocs
drwxr-xr-x  2 www-data www-data    4096 Sep 24 11:25 ocs-provider
-rw-r--r--  1 www-data www-data 2202660 Sep 24 11:25 package-lock.json
-rw-r--r--  1 www-data www-data    6341 Sep 24 11:25 package.json
-rw-r--r--  1 www-data www-data    3187 Sep 24 11:25 public.php
-rw-r--r--  1 www-data www-data    5597 Sep 24 11:25 remote.php
drwxr-xr-x  4 www-data www-data    4096 Sep 24 11:25 resources
-rw-r--r--  1 www-data www-data      26 Sep 24 11:25 robots.txt
-rw-r--r--  1 www-data www-data    2452 Sep 24 11:25 status.php
lrwxrwxrwx  1 www-data www-data      27 Sep 24 11:25 themes -> /var/www/html-shared/themes
lrwxrwxrwx  1 www-data www-data      32 Sep 24 11:25 version.php -> /var/www/html-shared/version.php

My RWX PVC is mounted at /var/www/html-shared.

The load average has decrease but I still have php-fpm process in uninterruptible sleep mode.

I've planned to preinstall apps in the image in a "custom_apps_ro" folder to decrease again the load on nfs volume