Nextcloud initializing takes too much time

urbaman commented 2 years ago

Hi,

I'm trying to setup nextcloud through the Helm chart, backed by some PVCs on shared glusterfs. Glusterfs works, as I see all of the paths (nextcloud, mariadb, redis master/replica) getting written. Thing is that the nextcloud pod's logs keeps staying in the "initializing nextcloud" part for more than 7 minutes, with the probes getting faults and restarting the pod.

Also, if I enable croinjob, it fails with something liek "wrong image" message.

Here's my values.yaml file:

## Official nextcloud image version
## ref: https://hub.docker.com/r/library/nextcloud/tags/
##
image:
  repository: nextcloud
  flavor: apache
  # If unset, uses flavor + .Chart.AppVersion to create tag
  # tag:
  pullPolicy: Always
  # pullSecrets:
  #   - myRegistrKeySecretName

nameOverride: ""
fullnameOverride: ""
podAnnotations: {}
deploymentAnnotations: {}

# Number of replicas to be deployed
replicaCount: 1

## Allowing use of ingress controllers
## ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
##
ingress:
  enabled: false
  # className: nginx
  annotations: {}
  #  nginx.ingress.kubernetes.io/proxy-body-size: 4G
  #  kubernetes.io/tls-acme: "true"
  #  cert-manager.io/cluster-issuer: letsencrypt-prod
  #  nginx.ingress.kubernetes.io/server-snippet: |-
  #    server_tokens off;
  #    proxy_hide_header X-Powered-By;

  #    rewrite ^/.well-known/webfinger /public.php?service=webfinger last;
  #    rewrite ^/.well-known/host-meta /public.php?service=host-meta last;
  #    rewrite ^/.well-known/host-meta.json /public.php?service=host-meta-json;
  #    location = /.well-known/carddav {
  #      return 301 $scheme://$host/remote.php/dav;
  #    }
  #    location = /.well-known/caldav {
  #      return 301 $scheme://$host/remote.php/dav;
  #    }
  #    location = /robots.txt {
  #      allow all;
  #      log_not_found off;
  #      access_log off;
  #    }
  #    location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
  #      deny all;
  #    }
  #    location ~ ^/(?:autotest|occ|issue|indie|db_|console) {
  #      deny all;
  #    }
  # tls:
  #   - secretName: nextcloud-tls
  #     hosts:
  #       - nextcloud.kube.home
  labels: {}
  path: /
  pathType: Prefix

# Allow configuration of lifecycle hooks
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
lifecycle: {}
  # postStartCommand: []
  # preStopCommand: []

phpClientHttpsFix:
  enabled: false
  protocol: https

nextcloud:
  host: nextcloud.domain.com
  username: admin
  password: password
  ## Use an existing secret
  existingSecret:
    enabled: false
    # secretName: nameofsecret
    # usernameKey: username
    # passwordKey: password
    # tokenKey: serverinfo_token
    # smtpUsernameKey: smtp_username
    # smtpPasswordKey: smtp_password
  update: 0
  # If web server is not binding default port, you can define it
  # containerPort: 8080
  datadir: /var/www/html/data
  persistence:
    subPath:
  mail:
    enabled: true
    fromAddress: nextcloud
    domain: domain.com
    smtp:
      host: smtp.domain.com
      secure: ssl
      port: 587
      authtype: LOGIN
      name: nextcloud@domain.com
      password: password
  # PHP Configuration files
  # Will be injected in /usr/local/etc/php/conf.d for apache image and in /usr/local/etc/php-fpm.d when nginx.enabled: true
  phpConfigs: {}
  # Default config files
  # IMPORTANT: Will be used only if you put extra configs, otherwise default will come from nextcloud itself
  # Default confgurations can be found here: https://github.com/nextcloud/docker/tree/master/16.0/apache/config
  defaultConfigs:
    # To protect /var/www/html/config
    .htaccess: true
    # Redis default configuration
    redis.config.php: false
    # Apache configuration for rewrite urls
    apache-pretty-urls.config.php: true
    # Define APCu as local cache
    apcu.config.php: true
    # Apps directory configs
    apps.config.php: true
    # Used for auto configure database
    autoconfig.php: false
    # SMTP default configuration
    smtp.config.php: true
  # Extra config files created in /var/www/html/config/
  # ref: https://docs.nextcloud.com/server/15/admin_manual/configuration_server/config_sample_php_parameters.html#multiple-config-php-file
  configs:
    redis.config.php: |-
      <?php
      $CONFIG = array (
        'memcache.local' => '\\OC\\Memcache\\Redis',
        'memcache.distributed' => '\OC\Memcache\Redis',
        'memcache.locking' => '\OC\Memcache\Redis',
        'redis' => array(
          'host' => getenv('REDIS_HOST'),
          'port' => getenv('REDIS_HOST_PORT') ?: 6379,
          'password' => getenv('REDIS_HOST_PASSWORD')
        )
      );
    custom.config.php: |-
      <?php
      $CONFIG = array (
        'overwriteprotocol' => 'https',
        'overwrite.cli.url' => 'https://drive.example.com',
        'filelocking.enabled' => 'true',
        'loglevel' => '2',
        'enable_previews' => true,
        'trusted_domains' =>
           [
            'nextcloud',
            'drive.example.com'
           ]
      );
  # For example, to use S3 as primary storage
  # ref: https://docs.nextcloud.com/server/13/admin_manual/configuration_files/primary_storage.html#simple-storage-service-s3
  #
  #  configs:
  #    s3.config.php: |-
  #      <?php
  #      $CONFIG = array (
  #        'objectstore' => array(
  #          'class' => '\\OC\\Files\\ObjectStore\\S3',
  #          'arguments' => array(
  #            'bucket'     => 'my-bucket',
  #            'autocreate' => true,
  #            'key'        => 'xxx',
  #            'secret'     => 'xxx',
  #            'region'     => 'us-east-1',
  #            'use_ssl'    => true
  #          )
  #        )
  #      );

  ## Strategy used to replace old pods
  ## IMPORTANT: use with care, it is suggested to leave as that for upgrade purposes
  ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
  strategy:
    type: Recreate
    # type: RollingUpdate
    # rollingUpdate:
    #   maxSurge: 1
    #   maxUnavailable: 0

  ##
  ## Extra environment variables
  extraEnv:
  #  - name: SOME_SECRET_ENV
  #    valueFrom:
  #      secretKeyRef:
  #        name: nextcloud
  #        key: secret_key

  # Extra init containers that runs before pods start.
  extraInitContainers: []
  #  - name: do-something
  #    image: busybox
  #    command: ['do', 'something']

  # Extra mounts for the pods. Example shown is for connecting a legacy NFS volume
  # to NextCloud pods in Kubernetes. This can then be configured in External Storage
  extraVolumes:
  #  - name: nfs
  #    nfs:
  #      server: "10.0.0.1"
  #      path: "/nextcloud_data"
  #      readOnly: false
  extraVolumeMounts:
  #  - name: nfs
  #    mountPath: "/legacy_data"

  # Extra secuurityContext parameters. For example you may need to define runAsNonRoot directive
  # extraSecurityContext:
  #   runAsUser: "33"
  #   runAsGroup: "33"
  #   runAsNonRoot: true
  #   readOnlyRootFilesystem: true

nginx:
  ## You need to set an fpm version of the image for nextcloud if you want to use nginx!
  enabled: false
  image:
    repository: nginx
    tag: alpine
    pullPolicy: Always

  config:
    # This generates the default nginx config as per the nextcloud documentation
    default: true
    # custom: |-
    #     worker_processes  1;..

  resources: {}

internalDatabase:
  enabled: false
  name: nextcloud

##
## External database configuration
##
externalDatabase:
  enabled: false

  ## Supported database engines: mysql or postgresql
  type: mysql

  ## Database host
  host:

  ## Database user
  user: nextcloud

  ## Database password
  password:

  ## Database name
  database: nextcloud

  ## Use a existing secret
  existingSecret:
    enabled: false
    # secretName: nameofsecret
    # usernameKey: username
    # passwordKey: password

##
## MariaDB chart configuration
##
mariadb:
  ## Whether to deploy a mariadb server to satisfy the applications database requirements. To use an external database set this to false and configure the externalDatabase parameters
  enabled: true

  auth:
    database: nextcloud
    username: admin
    password: password

  architecture: standalone

  ## Enable persistence using Persistent Volume Claims
  ## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
  ##
  primary:
    persistence:
      enabled: true
      existingClaim: nextcloud-mariadb-pvc
      accessMode: ReadWriteOnce
      size: 8Gi

##
## PostgreSQL chart configuration
## for more options see https://github.com/bitnami/charts/tree/master/bitnami/postgresql
##
postgresql:
  enabled: false
  global:
    postgresql:
      auth:
        username: nextcloud
        password: changeme
        database: nextcloud
  primary:
    persistence:
      enabled: false
      # storageClass: ""

##
## Redis chart configuration
## for more options see https://github.com/bitnami/charts/tree/master/bitnami/redis
##

redis:
  enabled: true
  auth:
    enabled: true
    password: password
  master:
    persistence:
      existingClaim: redis-master-pvc
  replica:
    persistence:
      existingClaim: redis-replica-pvc

## Cronjob to execute Nextcloud background tasks
## ref: https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html#webcron
##
cronjob:
  enabled: false
  # Nexcloud image is used as default but only curl is needed
  image: {}
    #repository: nextcloud
    #tag: apache
    #pullPolicy: Always
    # pullSecrets:
    #   - myRegistrKeySecretName
  # Every 5 minutes
  # Note: Setting this to any any other value than 5 minutes might
  #  cause issues with how nextcloud background jobs are executed
  schedule: "*/5 * * * *"
  annotations: {}
  # Set curl's insecure option if you use e.g. self-signed certificates
  curlInsecure: false
  failedJobsHistoryLimit: 5
  successfulJobsHistoryLimit: 2
  # If not set, nextcloud deployment one will be set
  # resources:
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi

  # If not set, nextcloud deployment one will be set
  # nodeSelector: {}

  # If not set, nextcloud deployment one will be set
  # tolerations: []

  # If not set, nextcloud deployment one will be set
  # affinity: {}

service:
  type: ClusterIP
  port: 8080
  loadBalancerIP: nil
  nodePort: nil

## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
##
persistence:
  # Nextcloud Data (/var/www/html)
  enabled: true
  annotations: {}
  ## nextcloud data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  # storageClass: "-"

  ## A manually managed Persistent Volume and Claim
  ## Requires persistence.enabled: true
  ## If defined, PVC must be created manually before volume will be bound
  # existingClaim:
  existingClaim: nextcloud-pvc
  accessMode: ReadWriteMany
  size: 8Gi

  ## Use an additional pvc for the data directory rather than a subpath of the default PVC
  ## Useful to store data on a different storageClass (e.g. on slower disks)
  nextcloudData:
    enabled: true
    subPath:
    annotations: {}
    # storageClass: "-"
    existingClaim: nextcloud-data-pvc
    accessMode: ReadWriteMany
    size: 5Ti

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #  cpu: 100m
  #  memory: 128Mi
  # requests:
  #  cpu: 100m
  #  memory: 128Mi

## Liveness and readiness probe values
## Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
##
livenessProbe:
  enabled: true
  initialDelaySeconds: 500
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
readinessProbe:
  enabled: true
  initialDelaySeconds: 500
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
startupProbe:
  enabled: false
  initialDelaySeconds: 500
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 30
  successThreshold: 1

## Enable pod autoscaling using HorizontalPodAutoscaler
## ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
##
hpa:
  enabled: true
  cputhreshold: 60
  minPods: 1
  maxPods: 10

nodeSelector: {}

tolerations: []

affinity: {}

## Prometheus Exporter / Metrics
##
metrics:
  enabled: true

  replicaCount: 1
  # The metrics exporter needs to know how you serve Nextcloud either http or https
  https: false
  # Use API token if set, otherwise fall back to password authentication
  # https://github.com/xperimental/nextcloud-exporter#token-authentication
  # Currently you still need to set the token manually in your nextcloud install
  token: ""
  timeout: 5s

  image:
    repository: xperimental/nextcloud-exporter
    tag: 0.5.1
    pullPolicy: IfNotPresent

  ## Metrics exporter resource requests and limits
  ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
  ##
  # resources: {}

  ## Metrics exporter pod Annotation and Labels
  # podAnnotations: {}

  # podLabels: {}

  service:
    type: ClusterIP
    ## Use serviceLoadBalancerIP to request a specific static IP,
    ## otherwise leave blank
    # loadBalancerIP:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9205"
    labels:
      app: nextcloud
      release: kube-prometheus-stack

  ## Prometheus Operator ServiceMonitor configuration
  ##
  serviceMonitor:
    ## @param metrics.serviceMonitor.enabled Create ServiceMonitor Resource for scraping metrics using PrometheusOperator
      ##
    enabled: false

    ## @param metrics.serviceMonitor.namespace Namespace in which Prometheus is running
    ##
    namespace: monitoring

    ## @param metrics.serviceMonitor.jobLabel The name of the label on the target service to use as the job name in prometheus.
    ##
    jobLabel: nextcloud

    ## @param metrics.serviceMonitor.interval Interval at which metrics should be scraped
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
    ##
    interval: 30s

    ## @param metrics.serviceMonitor.scrapeTimeout Specify the timeout after which the scrape is ended
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
    ##
    scrapeTimeout: ""

    ## @param metrics.serviceMonitor.labels Extra labels for the ServiceMonitor
    ##
    labels:
      release: kube-prometheus-stack
      app: nextcloud

rbac:
  enabled: false
  serviceaccount:
    create: true
    name: nextcloud-serviceaccount
    annotations: {}

urbaman commented 2 years ago

Finally working, with a proper DNS added and working. I had to add the DNS and a Traefik IngressRoute to use it.

Opening a new issue because now Nexcloud asks me if I'm reinstalling, and I'm actually not, right?

TheShadowBanned commented 2 years ago

Hey urbaman!

Initialization takes a long time and I had similar issues.

If you still have probs, try first init with cronjob disabled and set liveness, readiness and startup probes to 'false'.

Watch the logs of the nextcloud container for init and install to finish.

Adjust your values.yml:

Set pull policy of nextcloud to IfNotPresent
enable cronjob
enable the probes again

Then try to helm upgrade your installation. If that doesn't work, Helm uninstall an then install again, of course without deleting persistence.

Hope that helps.

Greetz

tstivers1990 commented 2 years ago

In my case, startupProbe.enabled is set to false by default. This causes the liveness and readiness probes to run too soon, which causes the container to be restarted before it can actually do the first time setup. The logs don't show any errors because the container is being restarted externally, not because the container itself had a problem. Thus why you just see the initializing message endlessly.

The fix for me was to set startupProbe.enabled to true, which allows the container to initialize before the readiness and liveness probes kick off and start rebooting it. If your cluster is slower, you may need to adjust the values under the startupProbe configuration to give it more time to start up.

lknite commented 2 years ago

The startup probe needs to be enabled by default and modified from using an httpGet to running a command that:

checks for initialization step is complete, such as checking if 'rsync' is running, and wait if not finished, long time first run
after it has verified it isn't initializing (which should be a quick check after first run), then it can run some test that makes sense for every startup (or maybe the startup probe is just that, it waits for the initialization to complete if there is one)
maybe when nextcloud first starts up it could have a flag set somewhere that it is initializing by default, then after the initialization completes it could set that flag to initialized, and the startupProbe could just check if that flag is showing as initialized

For me the rsync step took 3 minutes 30 seconds.

The alternative would just be to keep restarting and restarting the rync over and over again till the rsync has a chance to finally complete. Setting the startup probe with an initial delay of 4 minutes works ... but we don't really want to change that for first run then go back and change it everytime afterwards.

Note: 4 minutes worked, but it wasn't enough to complete the initial installation, 4 minutes was enough for the rsync to complete but it crashed and restarted 10 more times before the readiness and liveliness probes were happy ... I'm betting the initialization wasn't complete.)

A note in the log that its performing its first time initialization and that it may take a long time, up to 5 minutes, would be good to have in the logs as admins check to see why its taking so long.

schroedermatthias commented 1 year ago

It's not apparent to me, why nextcloud copies application code in the persistent storage location: /mnt/nfs/services/nextcloud-nextcloud/html/3rdparty/aws/aws-sdk-php In my eyes, this is not a valid approach for containerized applications.

My current configuration:

persistence:
      enabled: true
      storageClass: nfs
      accessMode: ReadWriteMany

jessebot commented 1 year ago

I'm actually not sure if that's something we're doing in the helm chart or if this is being done by something in the docker container, which would be a nextcloud/docker issue 🤔 I haven't had a moment to check though.

provokateurin commented 1 year ago

That's how the docker image works. I also don't like but afaik there is no way around it due to the architecture of Nextcloud (which is not very container friendly)

thefirstofthe300 commented 9 months ago

The way the chart currently works, thankfully, means that using a properly immutable image is possible. I did fork the nextcloud image and build a POC image which can run and perform basic upgrades without any rsyncing: https://github.com/thefirstofthe300/nextcloud-docker

The image isn't fully tested so there's likely some pieces that are broken, but that's mostly a matter of adding things back that I may have not reworked properly after gutting the install/upgrade logic. I welcome testing of the image, but please DON'T use it on production data yet.

In the name of testing, I do have a PR open to add removing the /var/www and /var/www/html mounts: #496

The lack of rsyncing almost certainly breaks folks using Docker Compose so submitting the image upstream may not be the easiest.

nextcloud / helm

Nextcloud initializing takes too much time #259