Closed HarHarLinks closed 5 months ago
We specify some wanted services in matrix_nginx_proxy_systemd_wanted_services_list
(which go to Wanted=
in the matrix-nginx-proxy.service
file) in group_vars/matrix_servers
: https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/4e4fb98a65474fd058c61a838db6ac312a09e7df/group_vars/matrix_servers#L1462-L1471
We might define additional wanted services there.
We have matrix_nginx_proxy_systemd_required_services_list
as well, but we avoid hardcoding such hard-dependencies, because.. If matrix-nginx-proxy
lists some random service (say matrix-client-element
) as required for matrix-nginx-proxy
to run, then restarting matrix-client-element
for whatever reason (it being restarted manually or dying and getting restarted) has the negative side-effect of bringing down matrix-nginx-proxy
as well.
If some random service (say matrix-dimension
) fails to start for whatever reason (misconfiguration, container image bug, etc.), we also don't want that to prevent matrix-nginx-proxy
from starting.
We don't want random services failing to start or being restarted to bring down everything else.
This is why we only use a "wanted services" list.
nginx insisting on resolving the DNS names for all defined upstreams when it starts is kind of bad, but.. from what I remember, there's no way around it, except for using their "nginx plus" offering.
Adding this to the matrix_nginx_proxy_systemd_wanted_services_list
list in group_vars/matrix_servers
may solve this particular problem:
matrix_nginx_proxy_systemd_wanted_services_list: |
{{
['matrix-' + matrix_homeserver_implementation + '.service']
+
(['matrix-corporal.service'] if matrix_corporal_enabled else [])
+
(['matrix-ma1sd.service'] if matrix_ma1sd_enabled else [])
+
(['matrix-client-element.service'] if matrix_client_element_enabled else [])
+ +
+ (['matrix-prometheus-postgres-exporter.service'] if matrix_prometheus_postgres_exporter_enabled else [])
}}
We could extend this list with various other services. PRs are welcome ;)
the ok status of stopping and starting all services seem to imply some existing relations that start/stop other services before the playbook does, however this is harmless unless it leads to an ordering violation elsewhere.
TASK [matrix-common-after : Ensure Matrix services are stopped] ******
changed: [matrix.matrix_domain] => (item=matrix-mailer.service)
changed: [matrix.matrix_domain] => (item=matrix-postgres.service)
changed: [matrix.matrix_domain] => (item=matrix-redis)
ok: [matrix.matrix_domain] => (item=matrix-appservice-webhooks.service)
ok: [matrix.matrix_domain] => (item=matrix-mautrix-signal.service)
changed: [matrix.matrix_domain] => (item=matrix-mautrix-signal-daemon.service)
ok: [matrix.matrix_domain] => (item=matrix-mx-puppet-discord.service)
ok: [matrix.matrix_domain] => (item=matrix-mx-puppet-steam.service)
ok: [matrix.matrix_domain] => (item=matrix-mx-puppet-slack.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-generic_worker-18111.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-federation_sender-0.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-pusher-0.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-appservice-0.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-media_repository-18551.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-worker-frontend_proxy-18771.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-admin.service)
changed: [matrix.matrix_domain] => (item=matrix-prometheus-node-exporter.service)
ok: [matrix.matrix_domain] => (item=matrix-registration.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-web.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-prosody.service)
ok: [matrix.matrix_domain] => (item=matrix-jitsi-jicofo.service)
ok: [matrix.matrix_domain] => (item=matrix-jitsi-jvb.service)
ok: [matrix.matrix_domain] => (item=matrix-dimension.service)
ok: [matrix.matrix_domain] => (item=matrix-etherpad.service)
changed: [matrix.matrix_domain] => (item=matrix-nginx-proxy.service)
changed: [matrix.matrix_domain] => (item=matrix-ssl-lets-encrypt-certificates-renew.timer)
changed: [matrix.matrix_domain] => (item=matrix-ssl-nginx-proxy-reload.timer)
changed: [matrix.matrix_domain] => (item=matrix-coturn.service)
changed: [matrix.matrix_domain] => (item=matrix-coturn-reload.timer)
ok: [matrix.matrix_domain] => (item=matrix-prometheus-postgres-exporter.service)
TASK [matrix-common-after : Ensure Matrix services are started] ******
changed: [matrix.matrix_domain] => (item=matrix-mailer.service)
changed: [matrix.matrix_domain] => (item=matrix-postgres.service)
changed: [matrix.matrix_domain] => (item=matrix-redis)
changed: [matrix.matrix_domain] => (item=matrix-appservice-webhooks.service)
changed: [matrix.matrix_domain] => (item=matrix-mautrix-signal.service)
ok: [matrix.matrix_domain] => (item=matrix-mautrix-signal-daemon.service)
changed: [matrix.matrix_domain] => (item=matrix-mx-puppet-discord.service)
changed: [matrix.matrix_domain] => (item=matrix-mx-puppet-steam.service)
changed: [matrix.matrix_domain] => (item=matrix-mx-puppet-slack.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-generic_worker-18111.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-federation_sender-0.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-pusher-0.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-appservice-0.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-media_repository-18551.service)
ok: [matrix.matrix_domain] => (item=matrix-synapse-worker-frontend_proxy-18771.service)
changed: [matrix.matrix_domain] => (item=matrix-synapse-admin.service)
changed: [matrix.matrix_domain] => (item=matrix-prometheus-node-exporter.service)
changed: [matrix.matrix_domain] => (item=matrix-registration.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-web.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-prosody.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-jicofo.service)
changed: [matrix.matrix_domain] => (item=matrix-jitsi-jvb.service)
changed: [matrix.matrix_domain] => (item=matrix-dimension.service)
changed: [matrix.matrix_domain] => (item=matrix-etherpad.service)
ok: [matrix.matrix_domain] => (item=matrix-nginx-proxy.service)
changed: [matrix.matrix_domain] => (item=matrix-ssl-lets-encrypt-certificates-renew.timer)
changed: [matrix.matrix_domain] => (item=matrix-ssl-nginx-proxy-reload.timer)
ok: [matrix.matrix_domain] => (item=matrix-coturn.service)
changed: [matrix.matrix_domain] => (item=matrix-coturn-reload.timer)
changed: [matrix.matrix_domain] => (item=matrix-prometheus-postgres-exporter.service)
matrix-appservice-webhooks and matrix-dimension both require matrix-nginx-proxy to be reachable before they should start.
matrix-nginx-proxy
is listed as a dependency in matrix_appservice_webhooks_systemd_required_services_list
in group_vars/matrix_servers
.
Likewise for matrix_dimension_systemd_required_services_list
.
The fact that the playbook tries to start matrix-appservice-webhooks
and matrix-dimension
before matrix-nginx-proxy
may be suboptimal, but is ultimately not a problem. systemd .service
files define dependencies correctly, so starting any one of these services will provoke matrix-nginx-proxy
to get started. Starting matrix-nginx-proxy
then becomes a no-op (you can see the ok mark there, instead of changed).
More-so, these dependencies are important for when services are started/restarted by other means (system reboot, manual systemd service restart, service failure, etc.). --tags=start
is not the only way to (re-)start services. Having correct dependencies in the systemd service files is more important than what --tags=start
does.
That said, we may reorder roles in setup.yml
to improve the --tags=start
situation. It probably needs to be done with care though, because certain roles (some bridges, at least) inject configuration into matrix-nginx-proxy
during runtime.
Similarly, some services also inject stuff into matrix-synapse
variables.
So.. re-ordering roles is probably not ideal.
Alternatively, each role can inject itself into matrix_systemd_services_list
with not just a service name, but also some priority. We can then sort them and stop/start them in a smarter way. This complicates things though, and for little benefit.
Still, if you're up for redoing all roles in such a way, PRs are welcome ;)
because certain roles (some bridges, at least) inject configuration into
matrix-nginx-proxy
during runtime.Example:
I'm sorry, either I don't understand what you're saying, or you're mixing ansible runtime and service runtime. This would be run during ansible and template the config and service files. However starting one service does not seem to modify the configuration of other services? Or I don't see how.
Still, if you're up for redoing all roles in such a way, PRs are welcome ;)
I know, I know... I don't see that kind of time at my hands currently, but it seems the correct although as we discussed low priority thing to do.
To outline a concrete issue: Currently nginx depends on prometheus-postgres-exporter as we have seen, and other services depend on nginx. Since the exporter seems to be the last thing to start via the playbook, and nginx starts much earlier via the playbook and probably even earlier as a dependency, nginx will keep failing since it doesn't Wants=
the exporter. As a result all containers that in turn depend on nginx, might be in a restart loop, such as matrix-appservice-webhooks, matrix-dimension, and probably more appservices. While it works out in my case after waiting a couple minutes, this can be avoided if done cleanly.
That's what I wanted to note down above in case someone is going to tackle this issue at some point.
I've updated the wanted services list for matrix-nginx-proxy
and matrix-grafana
in 0fb881deb578, which hopefully improves the situation.
I'm still not sure why your error says:
2022/01/08 17:11:01 [emerg] 1#1: host not found in upstream "matrix-prometheus-postgres-exporter" in /etc/nginx/conf.d/matrix-grafana.conf:63
I don't see why matrix-nginx-proxy
's matrix-grafana.conf
file would point to matrix-prometheus-postgres-exporter
. Looking at the template (roles/matrix-nginx-proxy/templates/nginx/conf.d/matrix-grafana.conf.j2
), it should only be pointing to matrix-grafana
.
indeed you're right! I'm using external metrics, and thus have added
matrix_nginx_proxy_proxy_grafana_additional_server_configuration_blocks:
- 'location /node-exporter/ {
resolver 127.0.0.11 valid=5s;
proxy_pass http://matrix-prometheus-node-exporter:9100/;
auth_basic "protected";
auth_basic_user_file /nginx-data/matrix-synapse-metrics-htpasswd;
}'
- 'location /postgres-exporter/ {
resolver 127.0.0.11 valid=5s;
proxy_pass http://matrix-prometheus-postgres-exporter:9187/;
auth_basic "protected";
auth_basic_user_file /nginx-data/matrix-synapse-metrics-htpasswd;
}'
I realize now my particular error is indeed nonstandard/custom config, but at the same time I suppose it could make sense to integrate into the playbook.
You can improve your situation by redefining matrix_nginx_proxy_systemd_wanted_services_list
.
Unfortunately, you can't easily add stuff to the list. We should probably introduce an additional wanted services variable (e.g. matrix_nginx_proxy_systemd_additional_wanted_services_list
), which gets merged with the other one.
You can similarly use matrix_nginx_proxy_systemd_required_services_list
, if necessary, but it suffers from the same problem -- you'd need to completely redefine the variable.
Another one with ordering issues, https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/1253
Another service start order issue, etherpad has permission issues in the current ordering, which requires its service be restarted before any pads will load.
Another service start order issue, etherpad has permission issues in the current ordering, which requires its service be restarted before any pads will load.
I can't confirm this, in fact my last restarts when upgrading synapse to 1.51 and 1.52 have been 100% without timeouts. I use etherpad. Can you elaborate?
Can you elaborate?
On a --tags=setup-all,start
, everything starts up without any obvious errors on the systemd side, however, attempting to access an embedded etherpad within a channel gets an etherpad permission error. Restarting the matrix-etherpad service and (after refreshing element to clear the etherpad attempted load) the embedded etherpad work.
I have similar issues with a worker-setup: matrix-common-after stops all services, then starts them again. nginx tries to start, but cannot find one of the workers, which has not managed to start yet ([emerg] 1#1: host not found in upstream "matrix-synapse-worker-generic_worker-18111:18111" in /etc/nginx/conf.d/matrix-synapse.conf:9
). It then takes 30 seconds for systemd to restart nginx, so the playbook does not detect nginx running and fails.
I fixed the issue by adding an override.conf
for the nginx unit that sets RestartSec=5
.
I'm still getting etherpad permission denied issues on --tags=setup-all,start
execution, I need to systemctl restart matrix-etherpad.service
to get it working.
this is kind of pedantic as it generally works out alright, but might potentially speed up upgrade-restarts.
starting the playbook, some services fail to start a couple times before eventually working. the most obvious one is nginx: it depends on all other containers to exist that are mentioned in its config, e.g.:
Other services depend on synapse and might start it earlier than the playbook order implies.
Services should obtain more After=, Requires=, etc config conditionally based on what is enabled and what isnt.
Is there a tool that can graphically render service relations from a bunch of .service files? Would be helpful.