Open lion24 opened 2 years ago
Yeah. I know what's is going on. I increase the verbosity and added some more debug logs.
apps/files/js/dist/
apps/files/js/dist/main.js
580,579 100% 362.98kB/s 0:00:01 (xfr#9506, ir-chk=1048/12720)
apps/files/js/dist/main.js.map
2,060,484 100% 750.82kB/s 0:00:02 (xfr#9507, ir-chk=1047/12720)
apps/files/js/dist/personal-settings.js
<killed>
It seems that the rsync is taking too much time to sync the config folder from /usr/src/nextcloud into /var/www/html and get killed (by the orchestrator?) as the pods taking too much time to be in ready states. This is probably because of rsync + chown + NFS combo
I will look how I can tune my NFS share on TrueNAS and I'll post results here.
I am running into the same error with a much simpler setup on a vm I am trying to provision with ansible. I had it successfully running with /var/www/html
mounting nfs on my nas and I broke it by trying to migrate my working docker-compose configuration to ansible managing the docker containers directly. I wonder if some file is locked in /var/www/html
(I've blown away and reprovisioned just about everything but that mount and the maria db mount.
@jcoulter Hello, yes this is clearly the NFS share the culprit + a lot of small files to sync. Once the docker container the first container spawned will acquire a lock (touch lock file) and other will wait for it to finish its jobs by monitoring the presence of this file. The issue is if the rsync did not finish in time, the container will get killed and the lock file will not be cleaned on the shared persistent storage, hence the process will never be completed.
Sending a big file using rsync on the NFS share is really a breeze and the bottleneck is clearly there on the SSD:
lionel@pve:~$ time rsync -rlDog --progress -v 5000M /mnt/pve/proxmox-fast/
sending incremental file list
5000M
5,242,880,000 100% 618.20MB/s 0:00:08 (xfr#1, to-chk=0/1)
sent 5,244,160,101 bytes received 35 bytes 616,960,016.00 bytes/sec
total size is 5,242,880,000 speedup is 1.00
real 0m8.420s
user 0m3.735s
sys 0m7.803s
The problem here is clearly NFS + rsync + lot of small files combo. Since there's a lot of files, rsync maybe does too much syscall and do too much context switching slowing down the copy process?
I keep digging down the rabbit hole 😀
Hi there,
I've setup PVC using "ReadWriteMany" accessMode for the nextcloud data directory over NFS. The NFS share is exposed by TrueNas and using democratic-csi driver for k8s: https://jonathangazeley.com/2021/01/05/using-truenas-to-provide-persistent-storage-for-kubernetes/
It seems while configuring nextcloud with the following values using multiple replicas:
That there's a kind of deadlock between pods waiting each other releasing a lock file:
Which never happened and pods after a while entering in a CrashLoopBack state and this preventing nextcloud to start properly.
My understanding is that the first pod launch will acquired the lock preventing the other pods to sync the html folder, one this pod finished syncing the html folder the initialization is assumed to be complete, would it be possible that this rsync task takes to much times are finish and times out?
I don't know if that rings a bell to someone ?
Cheers.