Open volker-raschek opened 3 years ago
Hi, I use the official helm chart to deploy nextcloud on our internal kubernetes instance. The container went unhealthy every time. I disabled the probes. The container is marked as ready but I can not connect to the nextcloud, neither via browser or curl.
I connect into the container and want to execute
php occ
to get some logs, because php does not write their logs on stdout. I get the following error message:$ php occ Warning: require_once(/var/www/html/lib/versioncheck.php): failed to open stream: No such file or directory in /var/www/html/console.php on line 35 Fatal error: require_once(): Failed opening required '/var/www/html/lib/versioncheck.php' (include_path='.:/usr/local/lib/php') in /var/www/html/console.php on line 35
The complete lib folder is not available.
$ $ ls -la total 108 drwxr-xr-x 1 www-data root 304 Jun 13 10:17 . drwxr-xr-x 1 root root 14 Jun 13 10:01 .. -rw-r--r-- 1 www-data root 3032 Jun 13 10:17 .htaccess -rw-r--r-- 1 www-data root 101 Jun 13 10:17 .user.ini drwxr-xr-x 1 www-data root 778 Jun 13 10:17 3rdparty -rw-r--r-- 1 www-data root 17234 Jun 13 10:17 AUTHORS -rw-r--r-- 1 www-data root 34520 Jun 13 10:17 COPYING drwxr-xr-x 1 www-data root 1098 Jun 13 10:15 apps drwxr-xr-x 1 www-data root 0 Jun 13 10:01 config -rw-r--r-- 1 www-data root 3893 Jun 13 10:17 console.php -rw-r--r-- 1 www-data root 5083 Jun 13 10:17 cron.php drwxr-xr-x 1 www-data root 0 Jun 13 10:01 custom_apps drwxr-xr-x 1 www-data root 0 Jun 13 10:01 data -rw-r--r-- 1 www-data root 156 Jun 13 10:17 index.html -rw-r--r-- 1 www-data root 2960 Jun 13 10:17 index.php -rwxr-xr-x 1 www-data root 283 Jun 13 10:17 occ -rw-r--r-- 1 www-data root 3102 Jun 13 10:17 public.php -rw-r--r-- 1 www-data root 5332 Jun 13 10:17 remote.php -rw-r--r-- 1 www-data root 26 Jun 13 10:17 robots.txt -rw-r--r-- 1 www-data root 2379 Jun 13 10:17 status.php drwxr-xr-x 1 www-data root 0 Jun 13 10:01 themes
I tried additionally the nginx + fpm setup. There I run into the same error. I upgraded the image to
20.0.4-apache
. This is the same image as we have deployed on a docker host. I removed the complete persistent volume, so that the containter can recreate the complete data directory, but I get this error, too. The lib directory is not available.How is this possible that the directory is missing and how can I fix it? Here is an old thread with the same issue. It seems to be not fixed.
Next, I found that the permissions of the directory /var/www/html are incomplete. The owner/group of all mounted volumes is
root:root
. I changed the permissions manually towww-data:root
. Otherwise has the container no access to the mounted directories. This should also be fixed.
Hi! I confirm the same situation with 18, 21 and the latest versions. I may add that the files comes later, however cannot understand when. For me it was enough to add sleep 2m before running next task.
I also seem to be encountering something similar thing with NFS docker mount.
This has been going on for a while now.e.g. in 2018.
A am seriously considering rolling my own here, especially after my find, I love nextcloud, but this is vexing.
This has nothing to do with the mounts as far as can tell.
The image, for whatever reason they are doing it is beyond me, is actually rsyncing the code over to /var/www/html
.
Considering that there is also user-data colocated in those subfolders and that running images makes us being used to mounting custom stuff with impunity over locations in the image, it runs the risk of overwriting stuff unintentionally.
I also thought, what would be so hard copying a repository over?
When looking at the entrypoint-script or the image itself (/var/www/html
is empty) the source code for nextcloud is in /usr/src/nextcloud
copying the folder /usr/src/nextcloud/lib
to /var/www/html/
causes occ to be executable again.
So the entry-point script has a bug, that could easily be sidestepped by having the source directly at the intented location in the first place as far as I can see this.
Here is a snippet to dissect the docker-image in question.
mkdir tmp
cd tmp
docker save 4a507000dafe |bsdtar -x
mkdir abc
cat manifest.json|jq -r '.[].Layers|.[]'|xargs -I {} tar -C abc -xf {}
cd abc
Same issue with my docker setup.
For me the workaround is to wait some seconds between the docker-compose up
and any occ
command.
I was able to fix this by increasing the initial delays in the startup probes. More details in #583 , but tl;dr adding this to my values.yaml
allowed enough time for the entrypoint.sh
to complete.
livenessProbe:
initialDelaySeconds: 7200
readinessProbe:
initialDelaySeconds: 7200
startupProbe:
initialDelaySeconds: 7200
@provokateurin since the fix here is just updating the initialDelaySeconds
for each of the probes, shall I close this?
Perhaps I add a section to the https://github.com/nextcloud/helm/tree/main/charts/nextcloud#troubleshooting docs called "Nextcloud fails to initialize properly" and then start linking back to issues such as this one, and then explain how to increase initialDelaySeconds
for each of the probes?
I wonder if https://github.com/nextcloud/helm/pull/344 could also fix the initialization problem by running the installation with in an init-container that doesn't have the probes. Otherwise we can document this workaround and close the issue, sure!
@provokateurin Sounds good to me. If you can move forward #344 and it gets merged, we need to add a section about probes to the docs anyway (submitted https://github.com/nextcloud/helm/pull/605), and then I think we can close this, and if others till have the problem, they can open a new Issue.
Also, missed this, sorry:
Next, I found that the permissions of the directory /var/www/html are incomplete. The owner/group of all mounted volumes is root:root. I changed the permissions manually to www-data:root. Otherwise has the container no access to the mounted directories. This should also be fixed.
You can adjust your fsgroup
and runAsUser
here (learn more about pod securityContext here), which can help with the permissions issue:
https://github.com/nextcloud/helm/blob/bf6cc4a9df0b3bffd3915dc940ddbec71976429e/charts/nextcloud/values.yaml#L216-L221
Hi, I use the official helm chart to deploy nextcloud on our internal kubernetes instance. The container went unhealthy every time. I disabled the probes. The container is marked as ready but I can not connect to the nextcloud, neither via browser or curl.
I connect into the container and want to execute
php occ
to get some logs, because php does not write their logs on stdout. I get the following error message:The complete lib folder is not available.
I tried additionally the nginx + fpm setup. There I run into the same error. I upgraded the image to
20.0.4-apache
. This is the same image as we have deployed on a docker host. I removed the complete persistent volume, so that the containter can recreate the complete data directory, but I get this error, too. The lib directory is not available.How is this possible that the directory is missing and how can I fix it? Here is an old thread with the same issue. It seems to be not fixed.
Next, I found that the permissions of the directory
/var/www/html
are incomplete. The owner/group of all mounted volumes isroot:root
. I changed the permissions manually towww-data:root
. Otherwise has the container no access to the mounted directories. This should also be fixed.