Closed ethindp closed 3 years ago
It's probably the fact that Postgres is slow to start on your server and the playbook did not wait long enough for it to become available:
If you re-run the same exact command a 2nd time, it succeeds, doesn't it? We should probably increase the wait time
@spantaleev That's what I thought, so I re-ran it. When that failed I posted this issue. I once again re-ran it a couple days ago but, as I expected it to, it failed again. I may need to rebuild all the containers from scratch. I assume that if I keep /matrixaround all my stuff will be restored? Also, I did use the "stop" tag before I rebooted my server, so maybe that interfered/caused problems. I was attempting to shut everything down cleanly.
Strange.. It usually works the 2nd time around.
Perhaps your container networking is borked and rebooting the server may help. You seem to have done that though.
So I'm not sure what would cause networking issues like that. Do you have SELinux enabled or some other security technology like that, which could be interfering? I see that you're running some kind of hardened kernel.
@spantaleev no, I don't have that enabled though enabling that is a good idea. But no, I haven't yet. I just have the standard hardened Linux kernel. But it worked before so I have no idea what changed.
I've just run into this issue also, Ubuntu 21.04, similar docker versions, regular kernel. In my case I had an overly aggressive match in systemd-networkd's config:
/etc/systemd/network/99-all.network
[Match] Name=*
...
I had to make the match more specific to the local interface (`Name=ens*`) to keep systemd-networkd from interfering with the docker veth and bridge interfaces.
Functioning docker bridges can be checked by looking for an IP address / link on the `br-<uuid>` interface:
5: br-d900f507f32a: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:55:39:58:a0 brd ff:ff:ff:ff:ff:ff inet6 fe80::42:55ff:fe39:58a0/64 scope link valid_lft forever preferred_lft forever
And by this quick test:
docker network create test docker run --rm --net test --name nginx -d nginx docker run --rm --net test -it busybox wget -q -O - nginx
If the bridge network works, nginx responds between the containers. If not, it gives the no route to host error:
> wget: can't connect to remote host (172.18.0.2): No route to host
It doesn't show up in logs very well because docker creates the networks correctly, then systemd-networkd makes changes. Kernel logs / `dmesg` had clues about the link going down and networkd activity.
This issue just mysteriously vanished after a reboot, so closing this.
Just recently rebooted my server and am trying to start all the services again. However, postgresql is failing to connect to the matrix-postgres server when doing task "Execute Postgres additional database initialization SQL file for synapse". The command is:
/usr/bin/env docker run --rm --user=967:1000 --cap-drop=ALL --env-file=/matrix/postgres/env-postgres-psql --network matrix --mount type=bind,src=/tmp/matrix-postgres-init-additional-db-user-and-role.sql,dst=/matrix-postgres-init-additional-db-user-and-role.sql,ro --entrypoint=/bin/sh docker.io/postgres:13.4-alpine -c psql -h matrix-postgres --file=/matrix-postgres-init-additional-db-user-and-role.sql
. The output is as follows:Environment:
Output of
docker ps
:Output of
docker network ls
:Hope I provided enough information -- is there a reason for why this is happening?