Open mamiapatrick opened 4 years ago
i had this issue too, the mistake i made was persisting /var/www/html which would get stuck at initializing. Persist only the data directory then it should work by that i mean volumeMounts:
if your pod ends up restarting after initial installation you will get another error message which is
Username is invalid because files already exist for this user
the way to get about this is to always change your nextcloud_admin_user before you restart the pod and the new user can be deleted later directly from the application.
Any suggestion on how to bypass this by editing the entrypoint would be nice, because i am currently trying to figure how to do that without editing the nextcloud_admin_user everytime
hello @johnbayo
i read your response and thank you but that one is not very "automatic" because we will need human intervention anytime the pods restarts ...like nextcloud cannot works normally on kubernetes as other pods.
In another hand if you do not persist custom_app and setting how do you keep these one persistent while pods restarts...
@johnbayo if you persist only data, do the config will be persistent if the pods restarts as config is at the mountPath: /var/www/html/config
@mamiapatrick no you cant persist the config. the config gets generated only on initialization. you have to edit the entrypoint by that let another script update your config on each pod restart.
@johnbayo but why everytime i delete the pod, i got an error that the username already exist. The pod is delete when change some configuration
@mamiapatrick you need to change the admin user before deleting your pod each time or another option would be to edit the entrypoint to ignore this. there might be another solution but unfortunately, i am not aware of that
At least some light in this issue. Indeed html can't be mounted, or will be stuck, but when the installation complete the pod never comes up:
i5Js@nanopim4:~/nextcloud$ kubectl logs --follow nextcloud -n nextcloud Initializing nextcloud 18.0.4.2 ... Initializing finished New nextcloud instance Installing with MySQL database starting nextcloud installation Nextcloud was successfully installed setting trusted domains… System config value trusted_domains => 1 set to string domain_name AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message [Thu May 14 11:06:45.742135 2020] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.38 (Debian) PHP/7.3.17 configured -- resuming normal operations [Thu May 14 11:06:45.742523 2020] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
i5Js@nanopim4:~/nextcloud$ kubectl get pod -n nextcloud NAME READY STATUS RESTARTS AGE nextcloud-bcf868c97-q9btj 0/1 Running 0 10m
Any tips?
Glad i'm not the only one that's hitting this.
After some more research while drafting this post, i found an issue that i think is related: https://github.com/nextcloud/helm/issues/590
I am attempting to update to the new 19.0.2
build. I have user data on a persistent volume and a database instance set up in a different pod. When I start with a 'fresh' volume for data, I have no issues setting up and installing so I know that there is no issue talking to the database.
Every time i kill the pod, a new one comes back... which is exactly what is supposed to happen. However, when I go to the next cloud instance, I get the message Username is invalid because files already exist for this user
. I can work around this by relocating the existing folder, creating a new admin user and copying the content from the old/relocated folder into the 'new' admin folder. This same 'workaround' is required for each user, too.
I think that this has something to do with the instanceID
... but this is not something that can be adjusted via env-vars so it can't be kept constant across new pods.
I think i've figured out how to get past this:
/var/www/html/config/*
to the user PVC (make a CFG
folder or similar)/var/www/html/config/
dir now has nothing in it (or, possibly just a config.php
with just the instanceid
)php
files from the CFG
dir on the user PVC into the config dir and back on to the config PVCShort version:
/var/www/html/config
pathIt's one hell of a messy work around but it seems to be working for me so far.
Hey @kquinsland glad to read your message. Yesterday I managed to install NC on my own kubernetes kluster and I encountered a bunch of errors related to what you are saying. I've been using the official chart and a newer version of NC, 19.0.1 than the one in the values,yaml
I deployed NC with persistency (PVC) and using an external Postgres database. First run works all as expected, setting up a liveness and readiness proof to 5 minutes, because it takes time to set up the whole environment, and if the pod restarts, I found all the problems exposed here, and here https://github.com/nextcloud/helm/issues/590 I believe we are all interested here on being able to redeploy NC if necessary, that is why we use K8s. If the pods die for some reason, I want the NC instance to deal with that. The problem seems now that the system tries to reinstall and create all the tables in the db again, making it impossible to automize the process.
I will post my values later on during the day, Right now I do not have them, but basically I have an NFS disk which is used in my PV/PVC and then I mount the whole /var/www/html/config exacty as the deployment says, except I deleted the mounting part of /var/www/html. It got stuck if not. Among other things, I spent a lot of time yesterday making it work.
The only solution I found was deleting the whole DB and the whole dir mounted in the PVC to make it run from zero, which is not what I want of course. I am going to try to only replace the config dir.
I could not make it work with more than one replica, I guess it is the same problem though, where all of them try to reinstall NC.
Hi @nilbacardit26 You’ve described my pain word by word... I’m done, I think nextcloud is not ready to work with Kubernetes...
@i5Js You are right, we basically use K8s to be able to rely in a system that can recover from errors on its own, and nowadays, that is not the case with the actual chart and entrypoint.
Same problem here. It would be great if it worked at Kubernetes. Sad.
Hey guys,
I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html.
I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied.
After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI.
Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded.
If this doesn't happen you can still run
su www-data -s /bin/sh -c "php occ maintenance:install
and nextcloud will complete the installation
Here's how the interesting part of the entrypoint.sh looks like:
if [ "$(id -u)" = 0 ]; then
rsync_options="-vrlDog --chown www-data:root --progress --append"
else
rsync_options="-rlDv --progress --append"
fi
And here's what I added in the values.yaml after creating the "docker-entrypoint" configMap replacing the original lines of code with the above:
extraVolumes:
- name: nextcloud-entrypoint
configMap:
name: nextcloud-entrypoint
defaultMode: 0700 #Way too generous
extraVolumeMounts:
- name: nextcloud-entrypoint
mountPath: "/entrypoint.sh"
subPath: entrypoint.sh
Also, in values.yaml:
livenessProbe:
enabled: true
initialDelaySeconds: 120
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 120
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
It takes 5 "restarts" for the rsync --append
to finish copying but I'm ok with that: it just happens once and delaying the check any further means longer time before Kubernetes understands there's a problem of any kind.
Hope this helps
Hey guys, I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html. I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied. After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI. Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded. If this doesn't happen you can still run
su www-data -s /bin/sh -c "php occ maintenance:install
and nextcloud will complete the installation
I can confirm having the same issue with NFS v4 as the backing storage for the PVC used for Nextcloud's persistence, I recently bumped the image from 22.1.1 to 22.2.0 and rsync
is still chugging away as I write this reply. I had startup probe enabled on my helm install but it seems to not even exist in the deployment (for better or worse).
I'm curious if cp
has the same issues as rsync
does with NFS or if it's more about NFS handling small files very badly in the first place. My current setup pretty much relies on sharing data over NFS as I'm not entirely sure KVM allows you to share the same block device between multiple VMs (and how to actually make use of that with k3s' local-path storage class).
iSCSI might be the way to go for situations like these but I'd prefer to use NFS as it's infinitely simpler to setup and get going than iSCSI when using Debian.
So I was looking at https://github.com/nextcloud/helm/issues/590#issuecomment-2365068573 and https://github.com/nextcloud/helm/issues/590#issuecomment-2223673034 in https://github.com/nextcloud/helm/issues/590, and I think both @kquinsland and @WladyX are onto something.
I posted some ideas and suggestions in https://github.com/nextcloud/helm/issues/590#issuecomment-2370441443 but the gist seems to be that we check /var/www/html/version.php
for the nextcloud version, and if that file doesn't exist, we initialize a new install.
The issue is that I'm not sure how to persist that file, without just using our normal PVC setup, which users don't want to use if they're already using S3, since version.php is not created by nextcloud/helm nor nextcloud/docker. I think it's created by nextcloud/server 🤔
Perhaps we can do some sort of check to see if s3 is already enabled? 🤔 Maybe checking if $OBJECTSTORE_S3_BUCKET is set in the docker-entrypoint.sh
? Open to ideas and suggestions to make this more approachable in either repo.
The core of the matter is that some k8s users seem to be disabling persistence of /var/www/html
because "it doesn't work".
E.g.
I deleted the mounting part of /var/www/html. It got stuck if not.
It seems in most cases this is an NFS / rsync interaction. Sometimes it is merely a performance matter (some of the examples above plus others like #1582). Sometimes it's a configuration matter (e.g. #1200)
However it also seems many people have no issues, so perhaps we limit the scope to:
P.S. Redesigning the image (and/or Nextcloud Server itself) to work w/o persistent storage for its installation folder is a bigger conversation (and a longer road probably), and already covered in #340 and #2044.
Hello i just install nextcloud in my private kubernetes cluster. If i install with no persistence, the software (pod) launch as well but anytime i tried to install it on a Persistent volume it just stuck at intializing and the pod never starts. With this i cannot persist data, config and others informations. I alors notice that even if i setup an external database. I still have as environnement variable sqllite_database