Closed rakshazi closed 2 years ago
Do you get these errors when you do --tags=start
?
From what I remember, we are dynamically populating the list of services that need to be started as the playbook executes. If workers are disabled, there should never be a matrix-synapse-worker-*
systemd service in the "services that should be started" list, regardless of whether such a systemd .service
exists on the host or not.
Or is this some error during worker cleanup, not during --tags=start
?
The error is part of workers cleanup process (task "Ensure any worker services are stopped"), so it's --tags setup-all
, not during start
Thanks for reporting this! While working on the Dendrite support branch (https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/818), I've encountered this same problem (matrix_synapse_enabled: false
and it tries to uninstall Synapse along with all old workers, etc.)
Seems like running a bar systemctl
doesn't output these failed units for me on CentOS 7.9.
The Ansible service_facts
built-in module which collects the unit files actually performs systemctl list-units --no-pager --type service --all
: https://github.com/ansible/ansible/blob/bc753c0518fd87c38fd3304f860fe55e00276303/lib/ansible/modules/service_facts.py#L247
I see a bunch of (not-found
, inactive
, dead
) services when I do systemctl list-units --no-pager --type service --all | grep synapse
.
Interestingly, neither systemctl reset-failed
(to reset all), not systemctl reset-failed SERVICE_NAME
change anything with regard to what I see for systemctl list-units --no-pager --type service --all | grep synapse
.
Thankfully, ansible_facts.services
contains a list of key/value things like this:
matrix-synapse-worker-appservice-0.service:
name: matrix-synapse-worker-appservice-0.service
source: systemd
state: stopped
status: not-found
By excluding status != 'not-found'
we can work around it, which is what I've done in 4625b34acca1.
Let's see how it goes with this fix. If anyone has a better idea, we can revisit this.
I just want to mention I ran into the same problem, even with the fix applied a few months ago. What helped me is running systemctl reset-failed SERVICE_NAME
, which completely removed the service entry (I'm running Debian, not CentOS).
Maybe other people discovering this thread will find it useful.
Hello,
when you enable synapse workers and then disable them, even after removal by playbook workers' units' info still exists in systemd and you can see
not-found failed
state if you list services (just runsystemctl
without params to get the full list).That's not a problem itself (even if you google for such behavior you'll find answers like "that's ok"), but when you run playbook again, it will fail with following errors (keep in mind that units were already removed and those services are just "ghosts" without any actual service):
To fix that issue manually, you can run
systemctl reset-failed
, but I think how it can be automated.My first idea was to add following task right under "Ensure any worker services are stopped" task in the
roles/matrix-synapse/tasks/synapse/workers/setup_uninstall.yml
:But it will not work on the first run (because units will not be marked as
not-found failed
at that moment), so it should be actually before the "Ensure any worker services are stopped" to fix the issue, but it will look weird.Sorry, I don't have better idea how to implement it, so here is the solution (the code above) - I hope you will find a correct place to add it