`matrix-postgres` does not start, `/var/lib/postgresql/data` is empty

JoshuaCWebDeveloper commented 4 years ago

After installing and attempting to start for the first time the ansible playbook with default postgres settings (no reference to "postgres" in the vars.yml file), the matrix-postgres container will fail to start with the following error:

Sep 11 01:40:19 matrix-dev-001 matrix-postgres[16854]: chmod: /var/run/postgresql: Operation not permitted
Sep 11 01:40:19 matrix-dev-001 matrix-postgres[16854]: PostgreSQL Database directory appears to contain a database; Skipping initialization
Sep 11 01:40:19 matrix-dev-001 matrix-postgres[16854]: postgres: could not find the database system
Sep 11 01:40:19 matrix-dev-001 matrix-postgres[16854]: Expected to find it in the directory "/var/lib/postgresql/data",
Sep 11 01:40:19 matrix-dev-001 matrix-postgres[16854]: but could not open file "/var/lib/postgresql/data/global/pg_control": No such file or directory

An examination of the container's filesystem confirms that the data directory is in fact empty:

bash-5.0$ ls -la /
total 68
drwxr-xr-x    1 root     root          4096 Sep 11 01:29 .
drwxr-xr-x    1 root     root          4096 Sep 11 01:29 ..
-rwxr-xr-x    1 root     root             0 Sep 11 01:29 .dockerenv
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 bin
drwxr-xr-x    5 root     root           360 Sep 11 01:29 dev
drwxr-xr-x    2 root     root          4096 Jun 11 22:46 docker-entrypoint-initdb.d
drwxr-xr-x    1 root     root          4096 Sep 11 01:29 etc
drwxr-xr-x    2 root     root          4096 May 29 14:20 home
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 lib
drwxr-xr-x    5 root     root          4096 May 29 14:20 media
drwxr-xr-x    2 root     root          4096 May 29 14:20 mnt
drwxr-xr-x    2 root     root          4096 May 29 14:20 opt
dr-xr-xr-x  188 root     root             0 Sep 11 01:29 proc
drwx------    2 root     root          4096 May 29 14:20 root
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 run
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 sbin
drwxr-xr-x    2 root     root          4096 May 29 14:20 srv
dr-xr-xr-x   13 root     root             0 Sep 11 01:09 sys
drwxrwxrwt    1 root     root          4096 Aug 14 17:32 tmp
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 usr
drwxr-xr-x    1 root     root          4096 Aug 14 17:32 var
bash-5.0$ ls -la /var/lib/postgresql/data/
total 8
drwxrwxrwx    2 postgres postgres      4096 Aug 14 17:32 .
drwxr-xr-x    1 postgres postgres      4096 Aug 14 17:32 ..
bash-5.0$

I don't see any mention of PostgreSQL in the installation instructions and the README.md seems to imply that it is configured by default. The output of the installation showed nothing that looked like an error relating to postgres:

TASK [matrix-postgres : set_fact] ******************************************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : (Deprecation) Warn about matrix_postgres_use_external usage] ***************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Fail if required Postgres settings not defined] ****************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix_postgres_connection_hostname)
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix_postgres_connection_username)
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix_postgres_connection_password)
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix_postgres_db_name)

TASK [matrix-postgres : Check if old Postgres data directory is used] ******************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Warn if old Postgres data directory detected] ******************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure matrix-postgres is stopped] ***********************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Find files and directories in old Postgres data path] **********************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Ensure new Postgres data path exists] **************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Relocate Postgres data files from old directory to new] ********************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure outdated matrix-postgres.service doesn't exist] ***************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure systemd reloaded after getting rid of outdated matrix-postgres.service] ***************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Initialize Postgres version determination variables (default to empty)] ****************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine existing Postgres version (check PG_VERSION file)] ***************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : set_fact] ******************************************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine existing Postgres version (read PG_VERSION file)] ****************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine existing Postgres version (make sense of PG_VERSION file)] *******************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine corresponding Docker image to detected version (assume default of latest)] ***************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine corresponding Docker image to detected version (use 9.x, if detected)] *******************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine corresponding Docker image to detected version (use 10.x, if detected)] ******************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Determine corresponding Docker image to detected version (use 11.x, if detected)] ******************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : set_fact] ******************************************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Inject warning if on an old version of Postgres] ***************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Ensure postgres Docker image is pulled] ************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Ensure Postgres paths exist] ***********************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk] => (item=/matrix/postgres)
ok: [matrix.umdev.sandbox.joshuacarter.tk] => (item=/matrix/postgres/data)

TASK [matrix-postgres : Ensure Postgres data path ownership is correct] ****************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Ensure Postgres environment variables file created] ************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk] => (item=env-postgres-psql)
ok: [matrix.umdev.sandbox.joshuacarter.tk] => (item=env-postgres-server)

TASK [Ensure matrix-postgres-cli script created] ***************************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Ensure matrix-change-user-admin-status script created] *********************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : (Migration) Ensure old matrix-make-user-admin script deleted] **************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure matrix-postgres-update-user-password-hash script created] *****************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure matrix-postgres.service installed] ****************************************************************************************************************************
ok: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure systemd reloaded after matrix-postgres.service installation] **************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Check existence of matrix-postgres service] **************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure matrix-postgres is stopped] ***********************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure matrix-postgres.service doesn't exist] ************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Ensure systemd reloaded after matrix-postgres.service removal] *******************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Check existence of matrix-postgres local data path] ******************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [Notify if matrix-postgres local data remains] ************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk]

TASK [matrix-postgres : Remove Postgres scripts] ***************************************************************************************************************************
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix-postgres-cli)
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix-change-user-admin-status)
skipping: [matrix.umdev.sandbox.joshuacarter.tk] => (item=matrix-postgres-update-user-password-hash)

Entire installation log is attached here: ansible-install-debug-001.log

This is my first time attempting to install this deployment on a server, so it is entirely possible I did something wrong. Any help or advice would be appreciated.

JoshuaCWebDeveloper commented 4 years ago

Your documentation said in a few places that re-running the setup script should fix any configuration issues. So all I tried yesterday was re-running the setup script successfully a few times, but that did not resolve the problem.

Today, I decided to try following the instructions in https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/uninstalling.md to uninstall completely and install fresh. This resolved my issue; after successfully running the setup script, the postgresql data directory was populated and the start script also ran successfully.

This would seem to be a glitch in the installation process that apparently can't be solved by merely re-running setup but that can be solved by starting over fresh.

spantaleev commented 4 years ago

Any idea if something happened during the first installation that may have caused this? Or something special that you did that may have interfered with Postgres?

JoshuaCWebDeveloper commented 4 years ago

There was one thing about the first install that was abnormal.

The first time I ran the setup-all task, it ran fine all the way through without any errors; however, it was run on a low memory gpc instance, and subsequently running the start task resulted in the process hanging due to low memory on the system. I ended up killing the start task, cleanly shutting down the instance, and upgrading it to a larger instance type. After booting it up again, running the start task failed because of the failing postgres container. However, other containers with mounted volumes (for example the matrix-appservice-discord container) started up fine without any complaints about missing data.

I did configure a handful of the documented add-ons for the first install, but I didn't attempt anything that was outside of the documentation.

Earlier today, I ended up having to upgrade the instance a second time, and had no problems switching it back on after the upgrade (no missing postgres data directory or anything else missing from the containers or the filesystem). Personally, I wouldn't expect turning the instance off to cause data corruption in a docker container running on it...

Due to an unrelated issue, I had to do another fresh install again later today, it went fine and this error did not happen again.

spantaleev commented 4 years ago

I'm guessing the Postgres container never did manage to properly start the first time around and it somehow corrupted its data directory.

(unless something more strange is going on, but we haven't had such complaints before).

There's usually no data loss for containers, even with abnormal server reboot, so it wouldn't be something I would worry about.

jooize commented 4 years ago

I had the same error today because I attempted to use a symlink for /matrix.

spantaleev / matrix-docker-ansible-deploy

`matrix-postgres` does not start, `/var/lib/postgresql/data` is empty #644