Closed kovukono closed 4 years ago
Can you define the use case a bit more?
The logic doesn't seem to follow the purpose
The action :install
will start the agent after creating the agent.yml
file and then the notifies
will have the service restart at the end of the run.
That was one thing I was hesitant about. It actually isn't waiting until the end of the run, just the end of the resource (after the enable and start). I didn't want to restart the service every run, just if the file had changed, but I did want to leave the original logic that the service was always ensured to be started. I couldn't find a cleaner way of it other than this. Any ideas if it could be made smoother?
If your use case is updating or modifying the agent.yml
a new action such as :configure
may be the best way to approach this.
From there you can use a file
or template
resource to handle the config file and have logic to restart the service if a change is detected.
So you're suggesting to move the template rendering out of the install action, and use a joint series of actions such as [:install, :configure]
for managing agents? Install would probably still handle the enabling and start of the service, correct?
The :install
action will require a agent.yml
file on disk to complete successfully, but it could just as easily call the :configure
action to complete that if that is how you would like to approach this feature request.
I really like this approach. I'll take a look at it tomorrow.
I tried to determine the action to take based on whether or not the configuration was updated, but both the resource's updated?
and updated_by_last_action?
aren't accurately reflecting whether or not the configuration file was updated, even if Chef marks it as such. Without being able to determine that, I'm not sure how to check whether or not the service needs to be restarted aside from the notifies
.
What about using a feature flag to control if we notify or not?
With the PR as it currently is, it's only restarting if the agent.yml changes, which is what we want from the notify. Or is there something different you meant?
Sorry for the break. I'm still unclear as to what a feature flag would accomplish, as we're trying to restart the service just if the agent.yml changes.
Feature flags would employ using tags to control the behavior of the chef run to skip or execute code.
agent_restart = tagged?('restart_agent')
file ‘/tmp/foo.txt’ do
content ‘foo’
notifies :restart, ‘service[bar]’
action :create if agent_restart
end
service ‘bar’ do
action :nothing
end
Alternatively you can use this example in NTP cookbook to restart a service after a file change: https://github.com/chef-cookbooks/ntp/blob/master/recipes/default.rb#L104#L149
I'm a bit confused. Why would we add this extra logic to say whether or not we would restart the agent's service? At the moment, it never restarts it, which leads to agents never picking up modified configurations. Is there a use case that I'm missing that we would want an agent not to have the latest configuration we pushed?
I spoke with @derekgroh on this and this what I recommend:
:delayed
action[1]I originally suggested a feature flag because it seemed like there were some scenarios where that might not be desired. I think its important to look at the context context. As a resource cookbook we have to generally provide a lot of flexibility which includes enabling use cases where you want to provide guardrails. Given that this is specific to the agent file I think it would not be too bad to update that to an immediate if there was a reason to (order of operations, checking valid config, etc). If this was for other resource types such as checks
, filters
, handlers
, etc I would be more guarded because it can lead to "thundering restarts".
The simplest solution that I am :+1: on is:
file ::File.join(new_resource.config_home, 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
notifies :restart, 'service[sensu-agent]', :delayed
end
If we need more complexity and guard rails we really need to outline the use cases so we can provide meaningful guardrails.
I added a commit yesterday to choose when to restart it, and just now pushed another one to control whether or not it restarts at all.
haven't had a chance to dig deeper but I smell a possible issue with an unless ... || condition
It's better to use positive assertions or negations when dealing with multiple boolean conditions to avoid subtle logic flaws. See http://www.railstips.org/blog/archives/2008/12/01/unless-the-abused-ruby-conditional/ for more info on that.
I've tested it, and like your link says, it does indeed apply the negative to all conditions. Do you want me to rework it to be only if statements?
Do I need to rework the unless statements, or is there anything else I need to work on for the PR?
Hey, I will probably take a look at this over the weekend, I am generally not a fan of using unless
with multiple conditions. In this case I would actually break out the logic into a helper function that could be called. Also I see in the non immediate one we dont specify :delayed
I assume thats the default then but I think its better to be explicit.
@majormoses I was wondering if you had had a chance to look at this PR yet?
@kovukono Sorry for the delay, we discussed the PR in Slack (https://sensucommunity.slack.com/archives/C67D53SC8/p1589221631017000) and wondered if you could further detail
Otherwise, the common Chef pattern is to restart the service at the end of the chef run with the default behavior :delayed
.
@derekgroh That specific suggestion was made by @majormoses. The original PR was just to set it to run as delayed in 2adee42, but was changed to include the immediate in 8564608 after you and majormoses asked for additional control flags. If we want to go back to the first commit, where it only restarts delayed, I'm perfectly fine with that. I'd just really like to finally come to a decision on this.
@kovukono I apologize for the confusion and frustration here. Reading back I can see how I was not as clear I could have been and being unresponsive/busy did not help. The consensus from the conversation in slack is that we need a very simple way to restart on change in a delayed fashion. I am all for whatever gets us there without breaking people.
I took another peak at the code and there are some slight challenges here that I had not accounted for when discussing this in slack. As I was not looking at anything in the PR other than the comments and suggested changes without further inspecting the code around it.
This might be a bit oversimplified (especially on the windows side, I see some powershell magic I didn't look into)
On the next converge we will not get a restart on the agent, agreed this is not desirable.
Unfortunately it is (probably) not safe after inspecting the code to just restart as the agent config as this happens before a service is defined in chefs resource collection. Thus a change to the order of operations may help simplify things for us. I don't currently have much time to do much testing and debugging this further but I think this might help get us back on the right track, in case I rebase/amend that commit to my branch later here is a diff:
$ git diff origin/master
diff --git a/resources/agent.rb b/resources/agent.rb
index 972057d..74dbf82 100644
--- a/resources/agent.rb
+++ b/resources/agent.rb
@@ -54,9 +54,15 @@ action :install do
end
end
+ # create a service resource so it can be safely called to restart on the first convergene
+ service 'sensu-agent' do
+ action :nothing
+ end
+
# render template at /etc/sensu/agent.yml for linux
file ::File.join(new_resource.config_home, 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
+ notifies :restart, 'service[sensu-agent]', :delayed
end
# Enable and start the sensu-agent service
@@ -81,9 +87,15 @@ action :install do
# Adds install directory to path
windows_path node['sensu-go']['sensu_bindir']
+ # create a service resource so it can be safely called to restart on the first convergene
+ service 'SensuAgent' do
+ action :nothing
+ end
+
# render template at c:\Programdata\Sensu\config\agent.yml for windows
file ::File.join('c:/ProgramData/Sensu/config/', 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
+ notifies :restart, 'service[SensuAgent]', :delayed
end
# Installs SensuAgent Service
Any chance someone can test this out and report back if there are issues with this approach?
Working on setting up a test now.
Going to test the direct code then run it through my own wrapper changing the values of config
to ensure the agent is restarting after a change to agent.yml
file.
I was able to check the Slack discussion this morning. checked with your updates, and we get the following convergence with extra service declaration:
* file[/etc/sensu/agent.yml] action create
- create new file /etc/sensu/agent.yml
- update content in file /etc/sensu/agent.yml from none to e3463d
--- /etc/sensu/agent.yml 2020-05-12 09:49:55.593740491 -0400
+++ /etc/sensu/.chef-agent20200512-8767-12u5by4.yml 2020-05-12 09:49:55.593740491 -0400
@@ -1 +1,23 @@
{snip}
- restore selinux security context
* service[sensu-agent] action enable
- enable service service[sensu-agent]
* service[sensu-agent] action start
- start service service[sensu-agent]
* service[sensu-agent] action restart
- restart service service[sensu-agent]
Without the second service, we get this:
* file[/etc/sensu/agent.yml] action create
- create new file /etc/sensu/agent.yml
- update content in file /etc/sensu/agent.yml from none to e3463d
--- /etc/sensu/agent.yml 2020-05-12 10:19:25.794695981 -0400
+++ /etc/sensu/.chef-agent20200512-8770-1hi19ey.yml 2020-05-12 10:19:25.793695970 -0400
@@ -1 +1,23 @@
{snip}
- restore selinux security context
* service[sensu-agent] action enable
- enable service service[sensu-agent]
* service[sensu-agent] action start
- start service service[sensu-agent]
* service[sensu-agent] action restart
- restart service service[sensu-agent]
With or without the empty service declaration, we're getting the same functionality. I haven't been able to test for Windows, however. I'm not sure we should be double-declaring the service. The Chef documentation is also declaring the resource afterward as well (see "Notify in a specific order"). Are we sure we want to add an additional declaration?
We should be using a new action
, we're still trying to do this under install
. Correct me if I'm mistaken, but this behavior would be to reload the config after the install during the lifecycle of the agent.
If we don't need to use the empty service declaration that's fine with me, I am not in the weeds of chef these days like I used to. I know that something like this used to fail without doing that but its very likely they have made it easier / more forgiving. I will take a look later but if the only reason to have a configure
attribute is just to restart config then I'd say we add it to install, if there are other things we realistically want to do then I think breaking that into its own action might make more sense. If this was the backend I would 100% agree as you may want to control the restart logic outside of chef (to facilitate HA restarts/reloads in a cluster).
I took a look at moving it to a configure resource:
action :configure do
if node['platform'] != 'windows'
# render template at /etc/sensu/agent.yml for linux
file ::File.join(new_resource.config_home, 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
notifies :restart, 'service[sensu-agent]', :delayed
end
else
# render template at c:\Programdata\Sensu\config\agent.yml for windows
file ::File.join('c:/ProgramData/Sensu/config/', 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
notifies :restart, 'service[SensuAgent]', :delayed
end
end
end
However, because the service is not declared within this resource, the run fails. Adding an empty service is also failing, due to systemctl not being able to find the sensu-agent service--even though I'm calling the configure action after the package installation within the install resource, it appears to be trying to run the configure before it actually installs the yum package, or anything else within the install action.
* sensu_agent[default] action install
* service[sensu-agent] action nothing (skipped due to action :nothing)
* file[/etc/sensu/agent.yml] action create
- create new file /etc/sensu/agent.yml
- update content in file /etc/sensu/agent.yml from none to e3463d
--- /etc/sensu/agent.yml 2020-05-12 11:56:51.966813538 -0400
+++ /etc/sensu/.chef-agent20200512-8774-rao1h7.yml 2020-05-12 11:56:51.966813538 -0400
@@ -1 +1,23 @@
{snip}
* service[sensu-agent] action restart
================================================================================
Error executing action `restart` on resource 'service[sensu-agent]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '5'
---- Begin output of /bin/systemctl --system restart sensu-agent ----
STDOUT:
STDERR: Failed to restart sensu-agent.service: Unit not found.
We can't delay the configure because the service needs to enable and start, but it needs a valid agent.yml in order to be able to start successfully. It looks as if the only way a configure action would work would be to move all service management to that action. Even then, we'd end up with the same enable -> start -> restart path, given the issue mentioned here with checking with updated?
and updated_by_last_action?
I don't think resources within actions can notify other resources in a different action. This is what I'm currently testing, but your point is valid, should we add a guard type action to ensure the service is installed?
action :configure do
if node['platform'] != 'windows'
# render template at /etc/sensu/agent.yml for linux
file ::File.join(new_resource.config_home, 'agent.yml') do
content(JSON.parse(new_resource.config.to_json).to_yaml.to_s)
end
# Restart the sensu-agent service to reload config file
service 'sensu-agent' do
action :restart
end
end
...
end
Actions definitely cannot notify actions within other resources, which was why I had to include the empty service. For yours, we wouldn't want to set the sensu-agent service to restart automatically. We instead need the agent.yml to notify the service to restart, which would have action :nothing
, or else this would restart the service every Chef run.
We would definitely need a guard if we included the service there, but I really think that we'd be fine with this still within the install action, with a notification for the restart on the agent.yml. Creating the configure action means:
We'd be doing all that extra engineering for the same result of adding an extra line to the install action.
That's true, we're still looking for behavior on only restarting if the content is changed or anytime the agent.yml is touched regardless of content change?
We'd just need it restarted if content changed. I don't think we'd need it restarted if, for example, we changed owner or readability, but I don't think that's an edge case we can address with a notifies. But yes, if the content is changed at all, we would want it restarted.
Ok, going back to https://github.com/sensu/sensu-go-chef/pull/87#issuecomment-627391494 (no :configure
action), if you remove the :start
from the last service 'sensu-agent'
does that give you the behavior you seek?
It should, but it would also not ensure that the agent was running if, for some reason, it was stopped and there was no change made to the agent.yml. This wouldn't be a problem on the initial run, but it could potentially cause problems afterward if manual actions were taken outside of Chef.
Pull Request Checklist
No, this does not address an open issue.
General
[X] Update Changelog following the conventions laid out here
[X] Update README with any necessary configuration snippets
[X] Cookstyle (rubocop) passes
[X] Foodcritic passes
[x] Rspec (unit tests) passes
[X] Inspec (integration tests) passes
New Features
[ ] Added a Testing Artifact as either an automated test or a manual artifact on the PR.
[ ] Added documentation for it to the
README.md
Purpose
Sensu-Agent does not restart after a configuration change, meaning those changes don't get applied. This will restart Sensu-Agent after a change to the configuration file.
Known Compatibility Issues