cannot initialize nextcloud when enable persistence on kubernetes

mamiapatrick commented 4 years ago

Hello i just install nextcloud in my private kubernetes cluster. If i install with no persistence, the software (pod) launch as well but anytime i tried to install it on a Persistent volume it just stuck at intializing and the pod never starts. With this i cannot persist data, config and others informations. I alors notice that even if i setup an external database. I still have as environnement variable sqllite_database

johnbayo commented 4 years ago

i had this issue too, the mistake i made was persisting /var/www/html which would get stuck at initializing. Persist only the data directory then it should work by that i mean volumeMounts:

name: nextcloud-data-dir mountPath: /var/www/html/data

if your pod ends up restarting after initial installation you will get another error message which is

Username is invalid because files already exist for this user

the way to get about this is to always change your nextcloud_admin_user before you restart the pod and the new user can be deleted later directly from the application.

Any suggestion on how to bypass this by editing the entrypoint would be nice, because i am currently trying to figure how to do that without editing the nextcloud_admin_user everytime

mamiapatrick commented 4 years ago

hello @johnbayo

i read your response and thank you but that one is not very "automatic" because we will need human intervention anytime the pods restarts ...like nextcloud cannot works normally on kubernetes as other pods.

In another hand if you do not persist custom_app and setting how do you keep these one persistent while pods restarts...

mamiapatrick commented 4 years ago

@johnbayo if you persist only data, do the config will be persistent if the pods restarts as config is at the mountPath: /var/www/html/config

johnbayo commented 4 years ago

@mamiapatrick no you cant persist the config. the config gets generated only on initialization. you have to edit the entrypoint by that let another script update your config on each pod restart.

mamiapatrick commented 4 years ago

@johnbayo but why everytime i delete the pod, i got an error that the username already exist. The pod is delete when change some configuration

johnbayo commented 4 years ago

@mamiapatrick you need to change the admin user before deleting your pod each time or another option would be to edit the entrypoint to ignore this. there might be another solution but unfortunately, i am not aware of that

i5Js commented 4 years ago

At least some light in this issue. Indeed html can't be mounted, or will be stuck, but when the installation complete the pod never comes up:

i5Js@nanopim4:~/nextcloud$ kubectl logs --follow nextcloud -n nextcloud Initializing nextcloud 18.0.4.2 ... Initializing finished New nextcloud instance Installing with MySQL database starting nextcloud installation Nextcloud was successfully installed setting trusted domains… System config value trusted_domains => 1 set to string domain_name AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message [Thu May 14 11:06:45.742135 2020] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.38 (Debian) PHP/7.3.17 configured -- resuming normal operations [Thu May 14 11:06:45.742523 2020] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

i5Js@nanopim4:~/nextcloud$ kubectl get pod -n nextcloud NAME READY STATUS RESTARTS AGE nextcloud-bcf868c97-q9btj 0/1 Running 0 10m

Any tips?

kquinsland commented 4 years ago

Glad i'm not the only one that's hitting this.

After some more research while drafting this post, i found an issue that i think is related: https://github.com/nextcloud/helm/issues/590

My issue / steps to reproduce:

I am attempting to update to the new 19.0.2 build. I have user data on a persistent volume and a database instance set up in a different pod. When I start with a 'fresh' volume for data, I have no issues setting up and installing so I know that there is no issue talking to the database.

Every time i kill the pod, a new one comes back... which is exactly what is supposed to happen. However, when I go to the next cloud instance, I get the message Username is invalid because files already exist for this user. I can work around this by relocating the existing folder, creating a new admin user and copying the content from the old/relocated folder into the 'new' admin folder. This same 'workaround' is required for each user, too.

I think that this has something to do with the instanceID... but this is not something that can be adjusted via env-vars so it can't be kept constant across new pods.

kquinsland commented 4 years ago

I think i've figured out how to get past this:

create a PVC for user data and config data
pipe the user data PVC into your deployment, but not the config
do the setup wizard / confirm that things "work"
copy the entire /var/www/html/config/* to the user PVC (make a CFG folder or similar)
delete the deployment
modify the deployment to now also use the user PVC
re-apply the deployment
check that the /var/www/html/config/ dir now has nothing in it (or, possibly just a config.php with just the instanceid)
copy all of the php files from the CFGdir on the user PVC into the config dir and back on to the config PVC
restart the pod again. You should still be able to access/login to NextCloud except now the file that has the instanceID is on a PV and no longer tied to the lifecycle of the pod.

Short version:

Copy the entire config dir somewhere safe after setting up NC and making sure things work as expected
modify the NC deployment to use a PVC for the /var/www/html/config path
copy the 'backed up' config to this PVC

It's one hell of a messy work around but it seems to be working for me so far.

nilbacardit26 commented 4 years ago

Hey @kquinsland glad to read your message. Yesterday I managed to install NC on my own kubernetes kluster and I encountered a bunch of errors related to what you are saying. I've been using the official chart and a newer version of NC, 19.0.1 than the one in the values,yaml

I deployed NC with persistency (PVC) and using an external Postgres database. First run works all as expected, setting up a liveness and readiness proof to 5 minutes, because it takes time to set up the whole environment, and if the pod restarts, I found all the problems exposed here, and here https://github.com/nextcloud/helm/issues/590 I believe we are all interested here on being able to redeploy NC if necessary, that is why we use K8s. If the pods die for some reason, I want the NC instance to deal with that. The problem seems now that the system tries to reinstall and create all the tables in the db again, making it impossible to automize the process.

I will post my values later on during the day, Right now I do not have them, but basically I have an NFS disk which is used in my PV/PVC and then I mount the whole /var/www/html/config exacty as the deployment says, except I deleted the mounting part of /var/www/html. It got stuck if not. Among other things, I spent a lot of time yesterday making it work.

The only solution I found was deleting the whole DB and the whole dir mounted in the PVC to make it run from zero, which is not what I want of course. I am going to try to only replace the config dir.

I could not make it work with more than one replica, I guess it is the same problem though, where all of them try to reinstall NC.

i5Js commented 4 years ago

Hi @nilbacardit26 You’ve described my pain word by word... I’m done, I think nextcloud is not ready to work with Kubernetes...

nilbacardit26 commented 4 years ago

@i5Js You are right, we basically use K8s to be able to rely in a system that can recover from errors on its own, and nowadays, that is not the case with the actual chart and entrypoint.

jeandevops commented 4 years ago

Same problem here. It would be great if it worked at Kubernetes. Sad.

Alfablos commented 4 years ago

Hey guys, I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html. I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied. After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI. Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded. If this doesn't happen you can still run su www-data -s /bin/sh -c "php occ maintenance:install and nextcloud will complete the installation

Alfablos commented 4 years ago

Here's how the interesting part of the entrypoint.sh looks like:

            if [ "$(id -u)" = 0 ]; then
                rsync_options="-vrlDog --chown www-data:root --progress --append"
            else
                rsync_options="-rlDv --progress --append"
            fi

And here's what I added in the values.yaml after creating the "docker-entrypoint" configMap replacing the original lines of code with the above:

  extraVolumes:
    - name: nextcloud-entrypoint
      configMap:
        name: nextcloud-entrypoint
        defaultMode: 0700                  #Way too generous

  extraVolumeMounts:
    - name: nextcloud-entrypoint
      mountPath: "/entrypoint.sh"
      subPath: entrypoint.sh

Also, in values.yaml:

livenessProbe:
  enabled: true
  initialDelaySeconds: 120
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
readinessProbe:
  enabled: true
  initialDelaySeconds: 120
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1

It takes 5 "restarts" for the rsync --append to finish copying but I'm ok with that: it just happens once and delaying the check any further means longer time before Kubernetes understands there's a problem of any kind. Hope this helps

Diftraku commented 3 years ago

Hey guys, I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html. I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied. After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI. Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded. If this doesn't happen you can still run su www-data -s /bin/sh -c "php occ maintenance:install and nextcloud will complete the installation

I can confirm having the same issue with NFS v4 as the backing storage for the PVC used for Nextcloud's persistence, I recently bumped the image from 22.1.1 to 22.2.0 and rsync is still chugging away as I write this reply. I had startup probe enabled on my helm install but it seems to not even exist in the deployment (for better or worse).

I'm curious if cp has the same issues as rsync does with NFS or if it's more about NFS handling small files very badly in the first place. My current setup pretty much relies on sharing data over NFS as I'm not entirely sure KVM allows you to share the same block device between multiple VMs (and how to actually make use of that with k3s' local-path storage class).

iSCSI might be the way to go for situations like these but I'd prefer to use NFS as it's infinitely simpler to setup and get going than iSCSI when using Debian.

jessebot commented 1 month ago

So I was looking at https://github.com/nextcloud/helm/issues/590#issuecomment-2365068573 and https://github.com/nextcloud/helm/issues/590#issuecomment-2223673034 in https://github.com/nextcloud/helm/issues/590, and I think both @kquinsland and @WladyX are onto something.

I posted some ideas and suggestions in https://github.com/nextcloud/helm/issues/590#issuecomment-2370441443 but the gist seems to be that we check /var/www/html/version.php for the nextcloud version, and if that file doesn't exist, we initialize a new install.

The issue is that I'm not sure how to persist that file, without just using our normal PVC setup, which users don't want to use if they're already using S3, since version.php is not created by nextcloud/helm nor nextcloud/docker. I think it's created by nextcloud/server 🤔

Perhaps we can do some sort of check to see if s3 is already enabled? 🤔 Maybe checking if $OBJECTSTORE_S3_BUCKET is set in the docker-entrypoint.sh? Open to ideas and suggestions to make this more approachable in either repo.

joshtrichards commented 2 weeks ago

The core of the matter is that some k8s users seem to be disabling persistence of /var/www/html because "it doesn't work".

E.g.

I deleted the mounting part of /var/www/html. It got stuck if not.

It seems in most cases this is an NFS / rsync interaction. Sometimes it is merely a performance matter (some of the examples above plus others like #1582). Sometimes it's a configuration matter (e.g. #1200)

However it also seems many people have no issues, so perhaps we limit the scope to:

why are some people having harder time with NFS than others?
are there some things we can document better to help these people?
are there some things we can do re: rsync to help a bit too?

P.S. Redesigning the image (and/or Nextcloud Server itself) to work w/o persistent storage for its installation folder is a bigger conversation (and a longer road probably), and already covered in #340 and #2044.

nextcloud / docker

cannot initialize nextcloud when enable persistence on kubernetes #1006

My issue / steps to reproduce: