nextcloud / helm

A community maintained helm chart for deploying Nextcloud on Kubernetes.
GNU Affero General Public License v3.0
332 stars 269 forks source link

nextcloud pod never becomes healthy: "connect: connection refused" #283

Closed kindrowboat closed 2 years ago

kindrowboat commented 2 years ago

I'm attempting to run nextcloud using this helm chart on a self-hosted microk8s cluster. I have an external loadbalancer (running Caddy) pointing the desired nextcloud URL at the cluster. After I "helm install", when I try to access nextcloud through the load balancer, I get a 503. When I look at the status of the nextcloud pod, I see that it never became healthy and the health check is getting a "connect: connection refused". I imagine that I somehow have my ingress or service options set up incorrectly, but I'm not sure. Any help is appreciated.

helm install output

kindrobot@ku001:~/spacework/k2dk8s/nextcloud$ helm install nextcloud -f values.yaml nextcloud/nextcloud
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/3597/credentials/client.config
NAME: nextcloud
LAST DEPLOYED: Mon Sep 12 22:46:13 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the nextcloud URL by running:

  export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=nextcloud" -o jsonpath="{.items[0].metadata.name}")
  echo http://127.0.0.1:8080/
  kubectl port-forward $POD_NAME 8080:80

2. Get your nextcloud login credentials by running:

  echo User:     admin
  echo Password: $(kubectl get secret --namespace default nextcloud -o jsonpath="{.data.nextcloud-password}" | base64 --decode)

kubectl describe output

kindrobot@ku001:~$ kubectl describe pod nextcloud-cfcc5b8db-6nrwx
Name:         nextcloud-cfcc5b8db-6nrwx
Namespace:    default
Priority:     0
Node:         ku002/192.168.1.11
Start Time:   Mon, 12 Sep 2022 22:46:16 +0000
Labels:       app.kubernetes.io/component=app
              app.kubernetes.io/instance=nextcloud
              app.kubernetes.io/name=nextcloud
              pod-template-hash=cfcc5b8db
Annotations:  cni.projectcalico.org/containerID: fd051d5529da5b7b12ca1e1674a30f938a3a934312b44d31fb3c6dcbce692870
              cni.projectcalico.org/podIP: 10.1.118.239/32
              cni.projectcalico.org/podIPs: 10.1.118.239/32
              nextcloud-config-hash: a5aae02b1b8278a9c8a2dc143e82d3737fc295f62c34afd617207f37d1b2b438
              php-config-hash: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
Status:       Running
IP:           10.1.118.239
IPs:
  IP:           10.1.118.239
Controlled By:  ReplicaSet/nextcloud-cfcc5b8db
Init Containers:
  mariadb-isalive:
    Container ID:  containerd://5659426faf50d794f304e32caf5350852bc78dcff47ce5cc4f130a4936b6f6b2
    Image:         docker.io/bitnami/mariadb:10.6.8-debian-11-r3
    Image ID:      docker.io/bitnami/mariadb@sha256:4f861cfda5f1883b0554a271e8b25e1c4c4cfd1c02bc516c99996b233a8b9502
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until mysql --host=nextcloud-mariadb --user=${MYSQL_USER} --password=${MYSQL_PASSWORD} --execute="SELECT 1;"; do echo waiting for mysql; sleep 2; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 12 Sep 2022 22:46:17 +0000
      Finished:     Mon, 12 Sep 2022 22:46:58 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      MYSQL_USER:      <set to the key 'db-username' in secret 'nextcloud-db'>  Optional: false
      MYSQL_PASSWORD:  <set to the key 'db-password' in secret 'nextcloud-db'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-767zp (ro)
Containers:
  nextcloud:
    Container ID:   containerd://454e64aded488304843baa2e56d08e230acb5a9401bc9fec4fd01a0270ef7060
    Image:          nextcloud:24.0.4-apache
    Image ID:       docker.io/library/nextcloud@sha256:d94c52ae3b1ba10a72cff9d44cf7615e44b13b07d07dbadbd7619131d6904cfc
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 12 Sep 2022 22:46:59 +0000
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:http/status.php delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http/status.php delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      MYSQL_HOST:                 nextcloud-mariadb
      MYSQL_DATABASE:             nextcloud
      MYSQL_USER:                 <set to the key 'db-username' in secret 'nextcloud-db'>      Optional: false
      MYSQL_PASSWORD:             <set to the key 'db-password' in secret 'nextcloud-db'>      Optional: false
      NEXTCLOUD_ADMIN_USER:       <set to the key 'nextcloud-username' in secret 'nextcloud'>  Optional: false
      NEXTCLOUD_ADMIN_PASSWORD:   <set to the key 'nextcloud-password' in secret 'nextcloud'>  Optional: false
      NEXTCLOUD_TRUSTED_DOMAINS:  knc.kindrobot.ca
      NEXTCLOUD_DATA_DIR:         /var/www/html/data
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-767zp (ro)
      /var/www/ from nextcloud-main (rw,path="root")
      /var/www/html from nextcloud-main (rw,path="html")
      /var/www/html/config from nextcloud-main (rw,path="config")
      /var/www/html/custom_apps from nextcloud-main (rw,path="custom_apps")
      /var/www/html/data from nextcloud-main (rw,path="data")
      /var/www/html/themes from nextcloud-main (rw,path="themes")
      /var/www/tmp from nextcloud-main (rw,path="tmp")
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  nextcloud-main:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nextcloud-nextcloud
    ReadOnly:   false
  kube-api-access-767zp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  103s               default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Scheduled         102s               default-scheduler  Successfully assigned default/nextcloud-cfcc5b8db-6nrwx to ku002
  Normal   Pulled            100s               kubelet            Container image "docker.io/bitnami/mariadb:10.6.8-debian-11-r3" already present on machine
  Normal   Created           100s               kubelet            Created container mariadb-isalive
  Normal   Started           100s               kubelet            Started container mariadb-isalive
  Normal   Pulled            58s                kubelet            Container image "nextcloud:24.0.4-apache" already present on machine
  Normal   Created           58s                kubelet            Created container nextcloud
  Normal   Started           58s                kubelet            Started container nextcloud
  Warning  Unhealthy         21s (x3 over 41s)  kubelet            Liveness probe failed: Get "http://10.1.118.239:80/status.php": dial tcp 10.1.118.239:80: connect: connection refused
  Normal   Killing           21s                kubelet            Container nextcloud failed liveness probe, will be restarted
  Warning  Unhealthy         1s (x6 over 41s)   kubelet            Readiness probe failed: Get "http://10.1.118.239:80/status.php": dial tcp 10.1.118.239:80: connect: connection refused

values.yml

Most of these values are the defaults with exception of enabling ingress, setting the public URL, and enabling persistent storage.

## Official nextcloud image version
## ref: https://hub.docker.com/r/library/nextcloud/tags/
##
image:
  repository: nextcloud
  # tag: 24.0.3-apache
  pullPolicy: IfNotPresent
  # pullSecrets:
  #   - myRegistrKeySecretName

nameOverride: ""
fullnameOverride: ""
podAnnotations: {}
deploymentAnnotations: {}

# Number of replicas to be deployed
replicaCount: 1

## Allowing use of ingress controllers
## ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
##
ingress:
  enabled: true
  # className: nginx
  annotations: {}
  #  nginx.ingress.kubernetes.io/proxy-body-size: 4G
  #  kubernetes.io/tls-acme: "true"
  #  cert-manager.io/cluster-issuer: letsencrypt-prod
  #  nginx.ingress.kubernetes.io/server-snippet: |-
  #    server_tokens off;
  #    proxy_hide_header X-Powered-By;

  #    rewrite ^/.well-known/webfinger /public.php?service=webfinger last;
  #    rewrite ^/.well-known/host-meta /public.php?service=host-meta last;
  #    rewrite ^/.well-known/host-meta.json /public.php?service=host-meta-json;
  #    location = /.well-known/carddav {
  #      return 301 $scheme://$host/remote.php/dav;
  #    }
  #    location = /.well-known/caldav {
  #      return 301 $scheme://$host/remote.php/dav;
  #    }
  #    location = /robots.txt {
  #      allow all;
  #      log_not_found off;
  #      access_log off;
  #    }
  #    location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
  #      deny all;
  #    }
  #    location ~ ^/(?:autotest|occ|issue|indie|db_|console) {
  #      deny all;
  #    }
  # tls:
  #   - secretName: nextcloud-tls
  #     hosts:
  #       - nextcloud.kube.home
  labels: {}
  path: /
  pathType: Prefix

# Allow configuration of lifecycle hooks
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
lifecycle: {}
  # postStartCommand: []
  # preStopCommand: []

phpClientHttpsFix:
  enabled: false
  protocol: https

nextcloud:
  host: knc.kindrobot.ca
  username: admin
  password: REDACTEDADMINPASSWORD
  ## Use an existing secret
  existingSecret:
    enabled: false
    # secretName: nameofsecret
    # usernameKey: username
    # passwordKey: password
    # tokenKey: serverinfo_token
    # smtpUsernameKey: smtp_username
    # smtpPasswordKey: smtp_password
  update: 0
  # If web server is not binding default port, you can define it
  # containerPort: 8080
  datadir: /var/www/html/data
  persistence:
    subPath:
  mail:
    enabled: false
    fromAddress: user
    domain: domain.com
    smtp:
      host: domain.com
      secure: ssl
      port: 465
      authtype: LOGIN
      name: user
      password: pass
  # PHP Configuration files
  # Will be injected in /usr/local/etc/php/conf.d for apache image and in /usr/local/etc/php-fpm.d when nginx.enabled: true
  phpConfigs: {}
  # Default config files
  # IMPORTANT: Will be used only if you put extra configs, otherwise default will come from nextcloud itself
  # Default confgurations can be found here: https://github.com/nextcloud/docker/tree/master/16.0/apache/config
  defaultConfigs:
    # To protect /var/www/html/config
    .htaccess: true
    # Redis default configuration
    redis.config.php: true
    # Apache configuration for rewrite urls
    apache-pretty-urls.config.php: true
    # Define APCu as local cache
    apcu.config.php: true
    # Apps directory configs
    apps.config.php: true
    # Used for auto configure database
    autoconfig.php: true
    # SMTP default configuration
    smtp.config.php: true
  # Extra config files created in /var/www/html/config/
  # ref: https://docs.nextcloud.com/server/15/admin_manual/configuration_server/config_sample_php_parameters.html#multiple-config-php-file
  configs: {}

  # For example, to use S3 as primary storage
  # ref: https://docs.nextcloud.com/server/13/admin_manual/configuration_files/primary_storage.html#simple-storage-service-s3
  #
  #  configs:
  #    s3.config.php: |-
  #      <?php
  #      $CONFIG = array (
  #        'objectstore' => array(
  #          'class' => '\\OC\\Files\\ObjectStore\\S3',
  #          'arguments' => array(
  #            'bucket'     => 'my-bucket',
  #            'autocreate' => true,
  #            'key'        => 'xxx',
  #            'secret'     => 'xxx',
  #            'region'     => 'us-east-1',
  #            'use_ssl'    => true
  #          )
  #        )
  #      );

  ## Strategy used to replace old pods
  ## IMPORTANT: use with care, it is suggested to leave as that for upgrade purposes
  ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
  strategy:
    type: Recreate
    # type: RollingUpdate
    # rollingUpdate:
    #   maxSurge: 1
    #   maxUnavailable: 0

  ##
  ## Extra environment variables
  extraEnv:
  #  - name: SOME_SECRET_ENV
  #    valueFrom:
  #      secretKeyRef:
  #        name: nextcloud
  #        key: secret_key

  # Extra init containers that runs before pods start.
  extraInitContainers: []
  #  - name: do-something
  #    image: busybox
  #    command: ['do', 'something']

  # Extra mounts for the pods. Example shown is for connecting a legacy NFS volume
  # to NextCloud pods in Kubernetes. This can then be configured in External Storage
  extraVolumes:
  #  - name: nfs
  #    nfs:
  #      server: "10.0.0.1"
  #      path: "/nextcloud_data"
  #      readOnly: false
  extraVolumeMounts:
  #  - name: nfs
  #    mountPath: "/legacy_data"

  # Extra secuurityContext parameters. For example you may need to define runAsNonRoot directive
  # extraSecurityContext:
  #   runAsUser: "33"
  #   runAsGroup: "33"
  #   runAsNonRoot: true
  #   readOnlyRootFilesystem: true

nginx:
  ## You need to set an fpm version of the image for nextcloud if you want to use nginx!
  enabled: false
  image:
    repository: nginx
    tag: alpine
    pullPolicy: IfNotPresent

  config:
    # This generates the default nginx config as per the nextcloud documentation
    default: true
    # custom: |-
    #     worker_processes  1;..

  resources: {}

internalDatabase:
  enabled: false
  name: nextcloud

##
## External database configuration
##
externalDatabase:
  enabled: false

  ## Supported database engines: mysql or postgresql
  type: mysql

  ## Database host
  host:

  ## Database user
  user: nextcloud

  ## Database password
  password:

  ## Database name
  database: nextcloud

  ## Use a existing secret
  existingSecret:
    enabled: false
    # secretName: nameofsecret
    # usernameKey: username
    # passwordKey: password

##
## MariaDB chart configuration
##
mariadb:
  ## Whether to deploy a mariadb server to satisfy the applications database requirements. To use an external database set this to false and configure the externalDatabase parameters
  enabled: true

  auth:
    database: nextcloud
    username: nextcloud
    password: REDACTEDDBPASSWORD

  architecture: standalone

  ## Enable persistence using Persistent Volume Claims
  ## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
  ##
  primary:
    persistence:
      enabled: true
      # storageClass: ""
      accessMode: ReadWriteOnce
      size: 8Gi

##
## PostgreSQL chart configuration
## for more options see https://github.com/bitnami/charts/tree/master/bitnami/postgresql
##
postgresql:
  enabled: false
  global:
    postgresql:
      auth:
        username: nextcloud
        password: changeme
        database: nextcloud
  primary:
    persistence:
      enabled: false
      # storageClass: ""

##
## Redis chart configuration
## for more options see https://github.com/bitnami/charts/tree/master/bitnami/redis
##

redis:
  enabled: false
  auth:
    enabled: true
    password: 'changeme'

## Cronjob to execute Nextcloud background tasks
## ref: https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html#webcron
##
cronjob:
  enabled: false
  # Nexcloud image is used as default but only curl is needed
  image: {}
    # repository: nextcloud
    # tag: 16.0.3-apache
    # pullPolicy: IfNotPresent
    # pullSecrets:
    #   - myRegistrKeySecretName
  # Every 5 minutes
  # Note: Setting this to any any other value than 5 minutes might
  #  cause issues with how nextcloud background jobs are executed
  schedule: "*/5 * * * *"
  annotations: {}
  # Set curl's insecure option if you use e.g. self-signed certificates
  curlInsecure: false
  failedJobsHistoryLimit: 5
  successfulJobsHistoryLimit: 2
  # If not set, nextcloud deployment one will be set
  # resources:
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #  cpu: 100m
    #  memory: 128Mi
    # requests:
    #  cpu: 100m
    #  memory: 128Mi

  # If not set, nextcloud deployment one will be set
  # nodeSelector: {}

  # If not set, nextcloud deployment one will be set
  # tolerations: []

  # If not set, nextcloud deployment one will be set
  # affinity: {}

service:
  type: ClusterIP
  port: 8080
  loadBalancerIP: nil
  nodePort: nil

## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
##
persistence:
  # Nextcloud Data (/var/www/html)
  enabled: true
  annotations: {}
  ## nextcloud data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  # storageClass: "-"

  ## A manually managed Persistent Volume and Claim
  ## Requires persistence.enabled: true
  ## If defined, PVC must be created manually before volume will be bound
  # existingClaim:

  accessMode: ReadWriteOnce
  size: 8Gi

  ## Use an additional pvc for the data directory rather than a subpath of the default PVC
  ## Useful to store data on a different storageClass (e.g. on slower disks)
  nextcloudData:
    enabled: false
    subPath:
    annotations: {}
    # storageClass: "-"
    # existingClaim:
    accessMode: ReadWriteOnce
    size: 8Gi

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #  cpu: 100m
  #  memory: 128Mi
  # requests:
  #  cpu: 100m
  #  memory: 128Mi

## Liveness and readiness probe values
## Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
##
livenessProbe:
  enabled: true
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
readinessProbe:
  enabled: true
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
startupProbe:
  enabled: false
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 30
  successThreshold: 1

## Enable pod autoscaling using HorizontalPodAutoscaler
## ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
##
hpa:
  enabled: false
  cputhreshold: 60
  minPods: 1
  maxPods: 10

nodeSelector: {}

tolerations: []

affinity: {}

## Prometheus Exporter / Metrics
##
metrics:
  enabled: false

  replicaCount: 1
  # The metrics exporter needs to know how you serve Nextcloud either http or https
  https: false
  # Use API token if set, otherwise fall back to password authentication
  # https://github.com/xperimental/nextcloud-exporter#token-authentication
  # Currently you still need to set the token manually in your nextcloud install
  token: ""
  timeout: 5s

  image:
    repository: xperimental/nextcloud-exporter
    tag: 0.5.1
    pullPolicy: IfNotPresent

  ## Metrics exporter resource requests and limits
  ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
  ##
  # resources: {}

  ## Metrics exporter pod Annotation and Labels
  # podAnnotations: {}

  # podLabels: {}

  service:
    type: ClusterIP
    ## Use serviceLoadBalancerIP to request a specific static IP,
    ## otherwise leave blank
    # loadBalancerIP:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9205"
    labels: {}

  ## Prometheus Operator ServiceMonitor configuration
  ##
  serviceMonitor:
    ## @param metrics.serviceMonitor.enabled Create ServiceMonitor Resource for scraping metrics using PrometheusOperator
      ##
    enabled: false

    ## @param metrics.serviceMonitor.namespace Namespace in which Prometheus is running
    ##
    namespace: ""

    ## @param metrics.serviceMonitor.jobLabel The name of the label on the target service to use as the job name in prometheus.
    ##
    jobLabel: ""

    ## @param metrics.serviceMonitor.interval Interval at which metrics should be scraped
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
    ##
    interval: 30s

    ## @param metrics.serviceMonitor.scrapeTimeout Specify the timeout after which the scrape is ended
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
    ##
    scrapeTimeout: ""

    ## @param metrics.serviceMonitor.labels Extra labels for the ServiceMonitor
    ##
    labels: {}

rbac:
  enabled: false
  serviceaccount:
    create: true
    name: nextcloud-serviceaccount
    annotations: {}
StephenLasseter commented 2 years ago

What are the logs from the pod telling you? kubectl logs -n default deployment/nextcloud

You might try setting startupProbe to enabled while bumping up its timeouts or disable all three probes in the case that it is just not having enough time to initialize on the first run before the probes bounce the pod and start all over again. This appears to be a common issue folks are running into based on the issues logged here. This is a good comment on how to work around a slow init: https://github.com/nextcloud/helm/issues/259#issuecomment-1203159235

And on the note of ingress, it is not involved with the probes as configured. The probes are pointing to the internal names. Once you have the pod up and stable, you can think about troubleshooting ingress if you still can't connect.

kindrowboat commented 2 years ago

Hey, thank you so much @StephenLasseter for responding. Setting startupProbe.enable to true and startupProbe.failureThreshold to 60 did the trick. I wonder if something like that should be in the provided values.yml.