Disabled (but running) systemd service is not restarted with mod_watch

edusperoni commented 6 years ago

Description of Issue/Question

When a systemd service is created, started, but not enabled, salt does not restart the service on mod_watch:

----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 10:14:52.621831
    Duration: 4298.499 ms
     Changes:
              ----------
              diff:
                  ---
                  <snip>
----------
          ID: zabbix-agent-service
    Function: service.running
        Name: zabbix-agent
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 10:14:56.922787
    Duration: 1427.353 ms
     Changes:
              ----------
              zabbix-agent:
                  True

Some packages (zabbix, especifically), don't enable their services when installed, so Salt only enables it, instead of enabling+restarting.

This issue is discussed more thoroughly at this PR: https://github.com/saltstack-formulas/zabbix-formula/pull/77

Setup

Formula: https://github.com/edusperoni/zabbix-formula states:

zabbix-agent.repo
zabbix-agent.conf

Steps to Reproduce Issue

On a machine that has never installed zabbix-agent, install it using the formula above.

On a machine that has zabbix-agent installed, run systemctl disable zabbix-agent && apt-get purge zabbix-agent -y and then apply the state.

Versions Report

Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: 3.5.0
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.10
         locale: UTF-8
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.10

Ch3LL commented 6 years ago

I'm having a hard time replicating this with a small use case as below:

root@localhost:~# cat /srv/salt/test.sls 
test:
  file.managed:
    - name: /tmp/test
    - watch_in:
      - service: apache2

apache2:
  service.running:
    - enable: True

When running the state it restarts the service:

----------
          ID: apache2
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:31:23.317472
    Duration: 2242.205 ms
     Changes:   
              ----------
              apache2:
                  True

Summary for local

I tried capturing what I found in the states you pointed out and applyling it to this service. Is the use case the same?

edusperoni commented 6 years ago

Try running the following commands:

systemctl disable apache2 service apache2 start

I'm not near my computer at this moment, but it should do the trick.

The idea is that systemctl status should report the service as disabled AND running

Ch3LL commented 6 years ago

I just tripled check that i ran those commands before running the state and I am still seeing the service restart. I even just changed it to zabbix-agent and cannot replicate.

Heres a detailed use case:

root@localhost:~# systemctl disable zabbix-agent
Synchronizing state for zabbix-agent.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d zabbix-agent defaults
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d zabbix-agent disable
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
root@localhost:~# service zabbix-agent start
root@localhost:~# sudo salt-call --local state.sls test
[WARNING ] State for file: /tmp/test - Neither 'source' nor 'contents' nor 'contents_pillar' nor 'contents_grains' was defined, yet 'replace' was set to 'True'. As there is no source to replace the file with, 'replace' has been set to 'False' to avoid reading the file unnecessarily.
local:
----------
          ID: test
    Function: file.managed
        Name: /tmp/test
      Result: True
     Comment: Empty file
     Started: 14:01:50.789805
    Duration: 36.405 ms
     Changes:   
              ----------
              new:
                  file /tmp/test created
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:01:51.760442
    Duration: 109.101 ms
     Changes:   
              ----------
              zabbix-agent:
                  True

Summary for local
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time: 145.506 ms

i'm wondering if my state is just not setup in the same way as your use case

edusperoni commented 6 years ago

Oh, I see now, you're using sysvinit, not systemd. Try it using either Ubuntu 16.04 or Debian 8 (or newer)

Ch3LL commented 6 years ago

my understanding is debian ensures they synchronize the services through both sysvinit and systemd. I even confirmed on the minion that it was running the salt/modules/systemd.py file.

Ch3LL commented 6 years ago

also to note i tested on both debian 8 and ubuntu 16

edusperoni commented 6 years ago

That's weird, it happened to both me and another commenter on the issue I linked, and we were able to replicate it by running:

systemctl disable zabbix-agent
apt-get purge zabbix-agent -y

and then applying the formula. (or applying the formula on a clean system)

Can you try with the formula I linked (my fork is on the correct commit, before the workaround)? Just use zabbix.agent.repo and zabbix.agent.conf and this pillar (just in case):

zabbix:
  lookup:
     version_repo: 3.4

Although it should work the same on debian 8, you could try it on debian 9, which is the exact version I'm using.

hatifnatt commented 6 years ago

I can confirm, problem exist. I just checked one more time. Disable and remove agent: systemctl disable zabbix-agent.service; systemctl stop zabbix-agent.service; apt-get -y purge zabbix-agent

Install it again: salt 'testvm*' state.sls zabbix.agent.repo,zabbix.agent.conf

  Name: deb http://repo.zabbix.com/zabbix/3.0/debian jessie main - Function: pkgrepo.managed - Result: Clean Started: - 22:00:40.098412 Duration: 30.905 ms
----------
          ID: zabbix-agent
    Function: pkg.installed
      Result: True
     Comment: The following packages were installed/updated: zabbix-agent
              The following packages were already installed: zabbix-sender
     Started: 22:00:40.129466
    Duration: 7967.602 ms
     Changes:
              ----------
              zabbix-agent:
                  ----------
                  new:
                      1:3.0.16-1+jessie
                  old:
  Name: zabbix - Function: group.present - Result: Clean Started: - 22:00:48.098539 Duration: 1.68 ms
  Name: zabbix - Function: user.present - Result: Clean Started: - 22:00:48.100595 Duration: 14.298 ms
  Name: /var/log/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.469282 Duration: 0.6 ms
  Name: /var/run/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.470001 Duration: 0.519 ms
----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 22:00:48.470621
    Duration: 99.676 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -1,3 +1,4 @@
                  +# Managed by saltstack. Do not edit this file.
                  ... snip ...
                   # TLSPSKFile=
                  +
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 22:00:48.570671
    Duration: 173.452 ms
     Changes:
              ----------
              zabbix-agent:
                  True

zabbix-agent log file

 12498:20180502:220047.523 agent #2 started [listener #1]
 12501:20180502:220047.525 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)

It's still using 127.0.0.1 as server, restart was not performed.

I will try to replicate this issue with simple test state later. May be there is some race condition.

zerthimon commented 6 years ago

Something weird is going on with this state. From what I observed, when the service is dead and needs to be started AND enabled it's only gets started. The next run enables the service.

Ch3LL commented 6 years ago

are any of you seeing this with any other services or only zabbix? Also are you certain this was working on previous salt versions?

zerthimon commented 6 years ago

I'm seeing this with docker.service, not sure if it's salt, docker or systemd problem :)

Ch3LL commented 6 years ago

and do you know if it was working on previous versions?

bandikoot commented 5 years ago

Hello, @Ch3LL!

I'd try to bring this up again. The issue is reproducable and can be simplified to the description below.

Situation 1

Assume, we have an .sls with:

config file's state:

/etc/unbound/unbound.conf:
file.managed:
- require:
  - pkg: unbound
- user: unbound
- contents: |
    include: "/etc/unbound/unbound.conf.d/*.conf"
    #test file changes

service state with a watch requisite:

unbound.service:
service.running:
- enable: True
- require:
  - cmd: unbound-systemctl-daemon-reload
  - pkg: unbound
- watch:
  - file: /etc/systemd/system/unbound.service
  - file: /etc/unbound/unbound.conf
  - file: /etc/unbound/unbound.conf.d

config file is going to have changes
systemd-service being currently disabled, but running

In this case on applying state we get:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:18:22.972941
    Duration: 14.977 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1 +1,2 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  +#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service unbound has been enabled, and is in the desired state
     Started: 20:18:23.218767
    Duration: 531.541 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 16:56:43 MSK; 3h 21min ago
 Main PID: 1133 (unbound)
   CGroup: /system.slice/unbound.service
           └─1133 /usr/sbin/unbound -c /etc/unbound/unbound.conf

and the service hasn't been restarted (take a look on active since) despite the watch for the config.

Situation 2

When the service is already enabled, but we still have changes in the watched config file, applying the state results in the service's restart correctly:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:29:21.331920
    Duration: 6.818 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1,2 +1 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  -#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service restarted
     Started: 20:29:21.638848
    Duration: 377.89 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 20:31:09 MSK; 37s ago
  Process: 13367 ExecStart=/usr/sbin/unbound -c /etc/unbound/unbound.conf (code=exited, status=0/SUCCESS)
  Process: 13360 ExecStartPre=/usr/sbin/unbound-checkconf (code=exited, status=0/SUCCESS)
  Process: 13356 ExecStartPre=/usr/sbin/unbound-anchor -a /var/lib/unbound/root.key -v (code=exited, status=0/SUCCESS)
 Main PID: 13369 (unbound)
    Tasks: 1
   Memory: 5.6M
      CPU: 36ms
   CGroup: /system.slice/unbound.service
           └─13369 /usr/sbin/unbound -c /etc/unbound/unbound.conf

I believe, service state should trigger systemd-service's enabling and restarting (or starting if it was inactive) in Situation 1. It also seems to me, that the issue is not somehow service-related. We're currently using 2018.3.2 version.

Could you take another look on it, please?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

Ch3LL commented 4 years ago

ping @bandikoot thanks for the example and sorry for the late reply. thanks to stalebot I saw this issue today.

I was finally able to replicate this and I'm just gonna paste my exact steps so i don't have an issue replicating this in the future:

Using this vagrant file:

Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
end

install salt
install unbound - apt-get install unbound
disable the service - systemctl disable unbound
run this state:

/etc/unbound/unbound.conf:
  file.managed:
    - user: unbound
    - contents: |
        include: "/etc/unbound/unbound.conf.d/*.conf"
        #test file changes

unbound.service:
  service.running:
    - enable: True
    - watch:
      - file: /etc/unbound/unbound.conf

salt-call --local state.sls test -ldebug

and now i see the issue. will label as a bug for us to get fixed thanks

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

nicholasmhughes commented 3 years ago

Confirmed on:

Versions Report

``` Salt Version: Salt: 3003.3 Dependency Versions: cffi: Not Installed cherrypy: unknown dateutil: 2.7.3 docker-py: Not Installed gitdb: 2.0.5 gitpython: 2.1.11 Jinja2: 2.10 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 0.5.6 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.6.1 pygit2: Not Installed Python: 3.7.3 (default, Jan 22 2021, 20:04:44) python-gnupg: Not Installed PyYAML: 3.13 PyZMQ: 17.1.2 smmap: 2.0.5 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.1 System Versions: dist: debian 10 buster locale: UTF-8 machine: x86_64 release: 4.19.0-17-amd64 system: Linux version: Debian GNU/Linux 10 buster ```

Given the text in the watch requisite documentation, it seems this is the intended (or at least expected) behavior.

If the watching state changes key contains values, then mod_watch will not be called. If you're using watch or watch_in then it's a good idea to have a state that only enforces one attribute - such as splitting out service.running into its own state and have service.enabled in another.

I haven't dug into the code for this too deep yet, but does anyone know if this is something that's "fixable" in mod_watch for the service state without fundamentally changing global watch behavior?

saltstack / salt