Enabling `.persistence.enabled` leads to `/var/www/html/lib` and probably other files not being created

v1nsai commented 1 week ago

Describe your Issue

Whenever persistence is enabled, /var/www/html/lib and probably other files are not created. This leads to crashing and errors. I'm using NFS on all the volumes which could be making it slower and losing a data race somewhere.

Logs and Errors

Warning: require_once(/var/www/html/lib/versioncheck.php): Failed to open stream: No such file or directory in /var/www/html/cron.php on line 4
0

Fatal error: Uncaught Error: Failed opening required '/var/www/html/lib/versioncheck.php' (include_path='.:/usr/local/lib/php') in /var/www/htm
l/cron.php:40
Stack trace:
#0 {main}
  thrown in /var/www/html/cron.php on line 40

Describe your Environment

Kubernetes distribution: self hosted MicroK8s v1.28.9 revision 6750
Helm Version (or App that manages helm): helm v3.15.1
Helm Chart Version: 5.0.0
values.yaml:


cronjob:
  enabled: true
  # lifecycle:
  #   postStartCommand: ["chown -R www-data:www-data /var/www/html"]
  # securityContext:
  #   runAsNonRoot: false
externalDatabase:
  enabled: true
  existingSecret:
    enabled: true
    passwordKey: mariadb-password
    secretName: mariadb-passwords
    usernameKey: mariadb-username
internalDatabase:
  enabled: false
mariadb:
  auth:
    existingSecret: mariadb-passwords
  enabled: true
  primary:
    persistence:
      enabled: true
nextcloud:
  configs:
    # add k8s pod namespace to trusted proxies, set phone region and maintenance window start time to 1am
    mycustom.config.php: |
      <?php
      $CONFIG = array(
        'trusted_proxies' => array('10.0.0.0/8'),
        'default_phone_region' => 'US',
        'maintenance_window_start' => 1,
        );
  existingSecret:
    secretName: nextcloud-admin
persistence:
  enabled: true # disabling this allows all files to be created and container boots successfully
  # existingClaim: nextcloud-data
  nextcloudData:
    enabled: true
redis:
  auth:
    enabled: false
    existingSecret: redis-password
    existingSecretKey: redis-password
  enabled: true
service:
  type: LoadBalancer
image:
  pullPolicy: Always
  tag: stable-fpm```

## Additional context, if any
<!-- Also note any additional relevant info about your environment. -->
<!-- example: If your issue is related to persistent volumes, let us know you're using NFS or EFS, for instance. -->

tvories commented 1 week ago

I think I'm having this issue as well. Whenever I try to do an upgrade, my occ binary doesn't always update to the latest file

provokateurin commented 1 week ago

https://github.com/nextcloud/helm/issues/584 this issue just got transferred here, maybe it is related?

luna-xenia commented 1 week ago

Same issue here, meaning I cannot access the nextcloud instance at all as the pod goes into a CrashLoopBackOff status

v1nsai commented 6 days ago

In my case I'm pretty sure its my NFS StorageClass that is at fault here. I'm getting abysmally slow read times (but not write) for some reason that I cannot replicate outside the container. Doing ps -aux | grep entrypoint.sh inside the nextcloud container shows that the entrypoint script is running, but getting interrupted when the container crashes due to the liveness probe.

I was able let the entrypoint script finish running by creating another Nextcloud container without the liveness probes to prevent timing out before (eventually) finishing. There were a lot of volume mounts, so I just pulled the generated Deployment yaml from my helm install using helm install nextcloud nextcloud/nextcloud --dry-run --debug -f values.yaml and removed the liveness probes to prevent it from crashing. I deployed it after deploying the nextcloud helm chart and left it for a few hours. When I came back everything was up.

EDIT: There are switches to disable the probes already available in values.yaml 💀

v1nsai commented 5 days ago

Adding the following to my values.yaml allowed everything to boot normally (eventually), so if your StorageClass is too slow to let the entrypoint.sh script complete before crashing, try this:

livenessProbe:
  initialDelaySeconds: 7200
readinessProbe:
  initialDelaySeconds: 7200
startupProbe:
  initialDelaySeconds: 7200

nextcloud / helm