Closed kevich closed 5 years ago
Commenting on my phone and can test more later... but I would like to clarify the problem is the service is not reloaded after new files are added - where there is a watch between the new file and the already running service.
I donβt think the last Apache test mentioned above demonstrates the problem Iβm seeing because Apache will always serve the contents of a html file. A service restart is not needed for that.
@cmclaughlin Yes, but out of a random directory?
where there is a watch between the new file and the already running service.
That's the crux of the matter. It the service isn't already running in your test. The program gets installed (1) the file gets added (2) and the service gets started (3).
There's nothing for salt to restart because the file already existed, in the correct state, before the service was started. Restarting the service won't actually do anything, because there's nothing to do.
If you want to see what I mean - comment this block out:
/tmp/testwatch:
file.managed:
- contents: hello
- watch_in:
- service: apache-service
Then run state.apply/highstate. Then add it back in and run your state (ensuring, of course, that /tmp/testwatch
doesn't already exist). You should see your service restart - that's what I've seen in every test I've tried.
I'll take another look today.. sorry, I didn't take a close look at the centos example over the weekend / reading on my phone. I see your point about serving out of a random directory... that makes sense now.
Installing Vagrant now... but I think this is where our expectations are different:
That's the crux of the matter. It the service isn't already running in your test. The program gets installed (1) the file gets added (2) and the service gets started (3).
Based on my example above, here's what I expect:
1) Package gets install 2) Service gets started 3) File with watch gets added 4) Service restarts because of watch on service
Thinking it through a bit more, it makes sense that I don't see the problem on Debian based systems since apt packages typically start the service automatically.
I suppose I'm proposing Salt's requisite system should handle this special case.
@cmclaughlin and that's what I've failed to communicate well :trollface:
Salt doesn't need to handle the special case because nothing wrong is happening. It's just weird, because it's not what we're used to seeing.
We're used to seeing your expectations - package installed, service gets started, file gets added, and service gets restarted. Why is the service restarted? Because it needs to load the changed file. And for Debian systems, apparently that's the norm - installing the package starts the service.
What we're seeing here is actually OK - because the package is installed, the files get updated, and then the service gets started. It would be silly to start the service just to restart it after adding files, or to add a file, start a service, and then restart the service just because we've already added the file π
I think we find it so weird an unexpected because when you or I manually install a server we're going to fire it up first to make sure it's working, then we'll futz with the config files, and then restart the server. When we do our setup in Salt... we've pre-futzed our files :trollface: π
Glad we're on the same page now...
I'd like to come up with a way for Salt to handle this situation... at the very least we should have some documentation that this is a common problem which can lead to very undesirable consequences.
I could easily follow the onchanges
example/work-around posted above... but I'm thinking service.running
should start the service ASAP (by injecting another entry into the low state?) in this situation. Perhaps service.running
should have an additional option like eary_start
?
I started poking around at the state and service sources... but I don't have a very good grasp on the code flow here. If anyone has ideas how to improve this I'd love to know.
I'm curious - what undesirable consequences are you imagining would happen here?
Thanks for your patience... I'm all over the place on this one. Maybe my internal code doesn't match up with the tests we've come up with here. I'll take a closer look at my internal Salt code and see if I can track down the problem.
Thanks again for your patience. I took a look at my internal salt repo and I think I can finally provide a decent example.
In at least one place we have the problem we started the service manually before service.running
was called, but didn't enable the service.
I've concluded:
Example:
apache-pkg:
pkg.installed:
- name: httpd24
# imagine module installation here
restart-apache-after-module-installation:
cmd.run:
- name: service httpd restart
apache-service:
service.running:
- name: httpd
- enable: True
/etc/httpd/foo.crt:
file.managed:
- contents: hi
- watch_in:
- service: apache-service
State output:
----------
ID: apache-pkg
Function: pkg.installed
Name: httpd24
Result: True
Comment: The following packages were installed/updated: httpd24
Started: 06:23:03.421665
Duration: 8289.552 ms
Changes:
----------
httpd24:
----------
new:
2.4.37-1.83.amzn1
old:
----------
ID: restart-apache-after-module-installation
Function: cmd.run
Name: service httpd restart
Result: True
Comment: Command "service httpd restart" run
Started: 06:23:11.714577
Duration: 134.233 ms
Changes:
----------
pid:
29860
retcode:
0
stderr:
stdout:
Stopping httpd: [FAILED]
Starting httpd: [ OK ]
----------
ID: /etc/httpd/wildcard.crt
Function: file.managed
Result: True
Comment: File /etc/httpd/wildcard.crt updated
Started: 06:23:11.864664
Duration: 13.596 ms
Changes:
----------
diff:
New file
----------
ID: apache-service
Function: service.running
Name: httpd
Result: True
Comment: Service httpd has been enabled, and is in the desired state
Started: 06:23:11.878634
Duration: 259.397 ms
Changes:
----------
httpd:
True
Log output:
2019-01-16 06:25:57,323 [salt.state ][INFO ][30096] Made the following changes:
'httpd24' changed from 'absent' to '2.4.37-1.83.amzn1'
2019-01-16 06:25:57,350 [salt.state ][INFO ][30096] Loading fresh modules for state activity
2019-01-16 06:25:57,402 [salt.state ][INFO ][30096] Completed state [httpd24] at time 06:25:57.402692 duration_in_ms=8236.931
2019-01-16 06:25:57,406 [salt.state ][INFO ][30096] Running state [service httpd restart] at time 06:25:57.406222
2019-01-16 06:25:57,406 [salt.state ][INFO ][30096] Executing state cmd.run for [service httpd restart]
2019-01-16 06:25:57,410 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command 'service httpd restart' in directory '/root'
2019-01-16 06:25:57,530 [salt.state ][INFO ][30096] {'pid': 30143, 'retcode': 0, 'stderr': '', 'stdout': 'Stopping httpd: [FAILED]\r\nStarting httpd: [ OK ]'}
2019-01-16 06:25:57,531 [salt.state ][INFO ][30096] Completed state [service httpd restart] at time 06:25:57.531133 duration_in_ms=124.91
2019-01-16 06:25:57,553 [salt.state ][INFO ][30096] Running state [/etc/httpd/foo.crt] at time 06:25:57.553259
2019-01-16 06:25:57,553 [salt.state ][INFO ][30096] Executing state file.managed for [/etc/httpd/foo.crt]
2019-01-16 06:25:57,569 [salt.state ][INFO ][30096] File changed:
New file
2019-01-16 06:25:57,569 [salt.state ][INFO ][30096] Completed state [/etc/httpd/foo.crt] at time 06:25:57.569583 duration_in_ms=16.325
2019-01-16 06:25:57,570 [salt.state ][INFO ][30096] Running state [httpd] at time 06:25:57.569983
2019-01-16 06:25:57,570 [salt.state ][INFO ][30096] Executing state service.running for [httpd]
2019-01-16 06:25:57,571 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/service httpd status' in directory '/root'
2019-01-16 06:25:57,675 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/chkconfig --list httpd' in directory '/root'
2019-01-16 06:25:57,683 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:25:57,704 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:25:57,721 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/chkconfig --list httpd' in directory '/root'
2019-01-16 06:25:57,742 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:25:57,756 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:25:57,778 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/chkconfig --list httpd' in directory '/root'
2019-01-16 06:25:57,786 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/chkconfig httpd on' in directory '/root'
2019-01-16 06:25:57,809 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/chkconfig --list httpd' in directory '/root'
2019-01-16 06:25:57,815 [salt.loaded.int.module.cmdmod ][INFO ][30096] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:25:57,821 [salt.state ][INFO ][30096] {'httpd': True}
2019-01-16 06:25:57,821 [salt.state ][INFO ][30096] Completed state [httpd] at time 06:25:57.821821 duration_in_ms=251.837
After changing cmd.run
to service httpd restart && /sbin/chkconfig httpd on
... the service is running and enabled when the watch runs... and the watch/restart works:
State output:
ID: apache-pkg
Function: pkg.installed
Name: httpd24
Result: True
Comment: The following packages were installed/updated: httpd24
Started: 06:27:36.354670
Duration: 8040.531 ms
Changes:
----------
httpd24:
----------
new:
2.4.37-1.83.amzn1
old:
----------
ID: restart-apache-after-module-installation
Function: cmd.run
Name: service httpd restart && /sbin/chkconfig httpd on
Result: True
Comment: Command "service httpd restart && /sbin/chkconfig httpd on" run
Started: 06:27:44.406192
Duration: 151.919 ms
Changes:
----------
pid:
30390
retcode:
0
stderr:
stdout:
Stopping httpd: [FAILED]
Starting httpd: [ OK ]
----------
ID: /etc/httpd/foo.crt
Function: file.managed
Result: True
Comment: File /etc/httpd/foo.crt updated
Started: 06:27:44.569892
Duration: 16.433 ms
Changes:
----------
diff:
New file
----------
ID: apache-service
Function: service.running
Name: httpd
Result: True
Comment: Service restarted
Started: 06:27:44.726029
Duration: 290.689 ms
Changes:
----------
httpd:
True
Summary for dev-charlestest-1b053.shopstyleops.com
------------
Succeeded: 4 (changed=4)
Failed: 0
------------
Total states run: 4
Total run time: 8.500 s
Logs
2019-01-16 06:27:44,308 [salt.state ][INFO ][30342] Made the following changes:
'httpd24' changed from 'absent' to '2.4.37-1.83.amzn1'
2019-01-16 06:27:44,334 [salt.state ][INFO ][30342] Loading fresh modules for state activity
2019-01-16 06:27:44,395 [salt.state ][INFO ][30342] Completed state [httpd24] at time 06:27:44.395201 duration_in_ms=8040.531
2019-01-16 06:27:44,406 [salt.state ][INFO ][30342] Running state [service httpd restart && /sbin/chkconfig httpd on] at time 06:27:44.406191
2019-01-16 06:27:44,406 [salt.state ][INFO ][30342] Executing state cmd.run for [service httpd restart && /sbin/chkconfig httpd on]
2019-01-16 06:27:44,410 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command 'service httpd restart && /sbin/chkconfig httpd on' in directory '/root'
2019-01-16 06:27:44,557 [salt.state ][INFO ][30342] {'pid': 30390, 'retcode': 0, 'stderr': '', 'stdout': 'Stopping httpd: [FAILED]\r\nStarting httpd: [ OK ]'}
2019-01-16 06:27:44,558 [salt.state ][INFO ][30342] Completed state [service httpd restart && /sbin/chkconfig httpd on] at time 06:27:44.558111 duration_in_ms=151.919
2019-01-16 06:27:44,569 [salt.state ][INFO ][30342] Running state [/etc/httpd/foo.crt] at time 06:27:44.569892
2019-01-16 06:27:44,571 [salt.state ][INFO ][30342] Executing state file.managed for [/etc/httpd/foo.crt]
2019-01-16 06:27:44,585 [salt.state ][INFO ][30342] File changed:
New file
2019-01-16 06:27:44,586 [salt.state ][INFO ][30342] Completed state [/etc/httpd/foo.crt] at time 06:27:44.586325 duration_in_ms=16.433
2019-01-16 06:27:44,586 [salt.state ][INFO ][30342] Running state [httpd] at time 06:27:44.586681
2019-01-16 06:27:44,587 [salt.state ][INFO ][30342] Executing state service.running for [httpd]
2019-01-16 06:27:44,587 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command '/sbin/service httpd status' in directory '/root'
2019-01-16 06:27:44,705 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command '/sbin/chkconfig --list httpd' in directory '/root'
2019-01-16 06:27:44,713 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command '/sbin/runlevel' in directory '/root'
2019-01-16 06:27:44,725 [salt.state ][INFO ][30342] The service httpd is already running
2019-01-16 06:27:44,725 [salt.state ][INFO ][30342] Completed state [httpd] at time 06:27:44.725779 duration_in_ms=139.097
2019-01-16 06:27:44,726 [salt.state ][INFO ][30342] Running state [httpd] at time 06:27:44.726029
2019-01-16 06:27:44,726 [salt.state ][INFO ][30342] Executing state service.mod_watch for [httpd]
2019-01-16 06:27:44,727 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command '/sbin/service httpd status' in directory '/root'
2019-01-16 06:27:44,816 [salt.loaded.int.module.cmdmod ][INFO ][30342] Executing command '/sbin/service httpd restart' in directory '/root'
2019-01-16 06:27:45,016 [salt.state ][INFO ][30342] {'httpd': True}
2019-01-16 06:27:45,016 [salt.state ][INFO ][30342] Completed state [httpd] at time 06:27:45.016718 duration_in_ms=290.689
2019-01-16 06:27:45,018 [salt.minion
@waynew I finally had time to come up with a streamlined setup of what we are doing here when the problem occurs. I'm attaching a tarball that makes it 100% reproducible with the help of terraform and aws (only free tier resources are used). Running the exhibit.sh
script will do the trick and it should be fairly easy to follow.
In short, it spins up an ubuntu server, install the latest salt in master-less mode and copy the states files and pillar and apply the highstate. Then it will change a pillar value and re-apply the highstate at which time the bug will be triggered and the service not-restarted as expected.
In this case, it boils down to a bad interaction between onchanges
and watch
. Here is the sls for reference:
/tmp/test/index.html:
file.managed:
- makedirs: True
- contents:
- Hello World
{# this will break the watch on `test_running` #}
- onchanges_in:
- test_running
test_systemd_unit:
file.managed:
- name: /etc/systemd/system/test.service
- source: salt://test.service.jinja
- template: jinja
- mode: 600
module.run:
- name: service.systemctl_reload
- onchanges:
- file: test_systemd_unit
test_running:
service.running:
- name: test
- enable: True
- watch:
- module: test_systemd_unit
After changing the pillar, the salt-call
output is:
local:
----------
ID: /tmp/test/index.html
Function: file.managed
Result: True
Comment: File /tmp/test/index.html is in the correct state
Started: 15:48:28.159757
Duration: 26.06 ms
Changes:
----------
ID: test_systemd_unit
Function: file.managed
Name: /etc/systemd/system/test.service
Result: True
Comment: File /etc/systemd/system/test.service updated
Started: 15:48:28.186076
Duration: 24.858 ms
Changes:
----------
diff:
---
+++
@@ -9,7 +9,7 @@
Type=simple
Restart=on-failure
WorkingDirectory=/tmp/test
-ExecStart=/usr/bin/python3 -m http.server 8080
+ExecStart=/usr/bin/python3 -m http.server 8081
[Install]
WantedBy=multi-user.target
----------
ID: test_systemd_unit
Function: module.run
Name: service.systemctl_reload
Result: True
Comment: Module function service.systemctl_reload executed
Started: 15:48:28.212099
Duration: 474.369 ms
Changes:
----------
ret:
True
----------
ID: test_running
Function: service.running
Name: test
Result: True
Comment: State was not run because none of the onchanges reqs changed
Started: 15:48:28.687950
Duration: 0.015 ms
Changes:
Summary for local
------------
Succeeded: 4 (changed=2)
Failed: 0
------------
Total states run: 4
Total run time: 525.302 ms
As you can see, the test_systemd_unit
is triggered and produces changes which is picked-up by the onchanges
on module.run
that does reload the systemd unit file but then the test_running
state is not triggered, pretending that none of the onchanges reqs changed
. This is technically true as its onchanges
points to the file.managed
of /tmp/test/index.html
which didn't change but it happily ignores the watch
which does trigger it correctly if you remove the onchanges_in
of the index.html
.
Cheers,
Olivier
@cmclaughlin Okay, so I discovered that you are definitely finding some unexpected behavior! It took a while for me to nail down exactly what's happening - I'm not sure if it's just a bug in our documentation, or if it's an actual bug.
What's happening here is kind of crazy.
If you remove the enable: True
, or split it into
enable-apache:
service.enabled:
- name: httpd24
Everything works π
The question you might have then is, "Buy why??"
Which is a darn good question π€£
Turns out that the answer is pretty simple though - the only thing that's going wrong here is that by starting (but not enabling) a service, when Salt looks at the service it says, "Ah, yeah, running is good. Oh, but let me enable this service. Cool, I've made some changes to it. What, you have a watch? No, I don't need to do anything, I've already changed this service!"
Oops :trollface:
I definitely agree that it seems strange, or perhaps even unintuitive, but it is at least consistent, and now I understand exactly why π
Just a had a quick conversation, and this is definitely intended behavior - I'm going to see about adding some information to our docs that calls it out, and perhaps gives some better explanation.
@waynew can you give an overview why it's intended behavior here?
@cmclaughlin Sure!
Watch looks for changes in one state, and if the watching state hasn't changed, calls triggers it's watching behavior.
If A is the watched state, and B is the watcher:
Does A have changes? Does B need changes? Tell B to change. Does A have changes? Has B already changed? Cool cool, nothing to do here.
That's it. That's what watching does.
In the confusing case we have the following:
Does the conf file have changes? Yeah, cool, does apache have changes? Oh, we enabled the service? Yeah, cool, nothing more to do here.
It's following the pattern, because enabling a service is a change. And this change happened after the config file was changed, so as far as watch is concerned, everything is A-OK and in the correct state.
It's confusing because we all expected service.running
to be what was watched, but it's actually the state apache-running
, and since the state had changes... It's a bit like a magician using misdirection, but instead of surprise and delight, we were all just annoyed π
The takeaway here is that the watcher is only required to change if it has not already.
apache-running
had changes, and from the watch
perspective, there's no way to distinguish if that change was service started
, or service enabled
. I don't even think there should be. I was thinking about the idea of creating an RFC to allow states to produce an "ignore this change for watches", but that would create terribly inconsistent behavior. Consider these examples:
apache enabled:
service.enabled:
- name: httpd24
apache running:
service.running:
- name: httpd24
And
apache running and enabled:
service.running:
- name: httpd24
- enable: True
If we want this to match our misunderstanding then service.running should be triggered if we just enable the service, right? Okay, say we do that.
What if someone wants to add watch_in
to their apache enabled
state? Are we going to say that because it's standalone it gets executed, but when it's part of another state it doesn't? Or are we just blacklisting enable service from watches/requisites?
There's really no consistent way we could change the behavior.
I do think that it's worth adding this kind of example to our docs, to explicitly call out the fact that watching is only triggered when a state has no changes, and that those changes may not match expectations.
Does that all make sense, or is there still some confusion?
Thanks, but it still doesn't really make sense to me.
I can see how there's a difference in behavior between watches on service.running
and service.enabled
... but if enabled
is going to be a supported option on service.running
I would expect it to behave exactly like a separate call to service.enabled
.
The part about "enabling a service is a change" so the restart doesn't happen is totally counterintuitive to me. I would expect when file.managed
is run, that's when the service status would be checked... and Salt would see the service is running, therefore a restart is required. From my end-user perspective, I don't see any reason why Salt can't enable the service and restart it. Why can only one change happen?
Thanks, but it still doesn't really make sense to me.
I think you actually just explained where our disconnect is π Let's see if it helps:
The part about "enabling a service is a change" so the restart doesn't happen is totally counterintuitive to me. I would expect when
file.managed
is run, that's when the service status would be checked...
That's a totally natural assumption... but how would you architect that? Would you couple file.managed
to service.running
? Where would you stop that coupling?
and Salt would see the service is running, therefore a restart is required. From my end-user perspective, I don't see any reason why Salt can't enable the service and restart it. Why can only one change happen?
Sure! I'd love to make that possible... but how would I implement that? I started to think about how to make that work but then I ran into this problem. How would you ensure that:
apache enabled:
service.enabled:
- name: httpd24
apache running:
service.running:
- name: httpd24
and
apache running and enabled:
service.running:
- name: httpd24
- enable: True
Behave identically when they're watch
ing another state?
I couldn't come up with any good plans - either I ended out with tight coupling, or adding a bunch of extra logic for each individual special case. Or they started making Salt behave inconsistently, which I think is probably the worst thing of all.
Do you have any ideas how we could make that happen?
os: opensuse service_provider: rh_service salt-call --versions-report Salt: 2014.1.5 Python: 2.7.2 (default, Aug 19 2011, 20:41:43) [GCC] Jinja2: 2.6 M2Crypto: 0.21.1 msgpack-python: 0.2.4 msgpack-pure: Not Installed pycrypto: 2.3 PyYAML: 3.10 PyZMQ: 13.0.0 ZMQ: 3.2.2
scenario:
if we edit pg_hba so file.replace runs again then service restarts ok, but not for the first run. I think it's somehow related to postgres_first_run starting service, but can't find anything relevant in the logs
example states from debug output: