Closed JonoRicci closed 4 years ago
I can run the chef-splunk kitchen test with dokken successfully without reproducing the error.
If I swap the dokken driver with the ec2 driver and add a very simple Inspec test I can reproduce our error in the chef-splunk
cookbook. (I ran the Inspec test in the dokken driver which resulted with the same outcome).
The Inspec test:
describe service('splunk') do
it { should be_installed }
it { should be_enabled }
it { should be_running }
end
The result:
Service splunk
✔ is expected to be installed
✔ is expected to be enabled
× is expected to be running
expected that `Service splunk` is running
Investigating on the instance:
@JonoRicci can you show what your systemd unit file looks like? If you're calling the client recipe directly, you may not be setting up the splunk auth attributes. There is logic in the default recipe that reads the splunk admin user/pass from a data bag or from chef-vault.
you need to have this in an encrypted data bag or chef-vault item:
vault_item = chef_vault_item(node['splunk']['data_bag'], "splunk_#{node.chef_environment}")
Hi @haidangwa, I've created PR #186 that adds some inspec tests to chef-splunk for the client suite that shows the issue we are seeing without our wrapper cookbook.
The output of verify on ubuntu-2004 is as follows:
System Package splunkforwarder
✔ should be installed
Service splunk
✔ should be installed
✔ should be enabled
× should be running
expected that `Service splunk` is running
Port 8089
✔ should be listening
✔ protocols should include "tcp"
Processes splunkd
✔ should exist
Test Summary: 6 successful, 1 failure, 0 skipped
To me the cause seems with the starting of splunk to accept the license, if I login to the docker container and stop splunk with /opt/splunkforwarder/bin/splunk stop
and service splunk start
. All the tests pass
System Package splunkforwarder
✔ should be installed
Service splunk
✔ should be installed
✔ should be enabled
✔ should be running
Port 8089
✔ should be listening
✔ protocols should include "tcp"
Processes splunkd
✔ should exist
Test Summary: 7 successful, 0 failures, 0 skipped
Edit: Made it clearer we see these issues directly with chef-splunk and added summary to the verify output.
@jjm Have you accepted the license? There is one way and only one way to accept the license: https://github.com/chef-cookbooks/chef-splunk#license-acceptance
@haidangwa Yes, it's done by this line of the kitchen.yml
file:
EDIT: Linked to chef license acceptance, not splunk.
The failure mode can be seem at https://github.com/chef-cookbooks/chef-splunk/pull/186/checks?check_run_id=1260401551 too.
Happening the same here.
In my case, kitchen converge
completes without an error:
Recipe: chef-splunk::service
* service[splunk] action restart
- restart service service[splunk]
Running handlers:
Running handlers complete
Chef Infra Client finished, 22/44 resources updated in 36 seconds
but after that if I run kitchen verify:
System Package splunkforwarder
✔ is expected to be installed
Service splunk
✔ is expected to be installed
✔ is expected to be enabled
× is expected to be running
expected that `Service splunk` is running
Test Summary: 3 successful, 1 failure, 0 skipped
The interesting thing is what ps -aux
shows me:
root@default-ubuntu-1804:/# ps -aux | grep splunk
root 839 1.6 1.0 294276 80752 ? Sl 20:28 0:00 splunkd -p 8089 restart
root 840 0.0 0.1 87852 13584 ? Ss 20:28 0:00 [splunkd pid=839] splunkd -p 8089 restart [process-runner]
root 965 0.0 0.0 11460 1028 pts/0 S+ 20:29 0:00 grep --color=auto splunk
it seems to me that the problem is in the restart of the service. If I kill those processes and I converge again, then everything is fine:
System Package splunkforwarder
✔ is expected to be installed
Service splunk
✔ is expected to be installed
✔ is expected to be enabled
✔ is expected to be running
Test Summary: 4 successful, 0 failures, 0 skipped
Myself and @jjm have encountered the following problem and would be very grateful for any assistance.
Expected Behaviour
I want to install the splunk universal forwarder in my AWS EC2 environment.
I am using a wrapper cookbook which only determines the host OS and passes through a private appropriate installation URL to the chef-splunk cookbook. In my wrapper cookbook I am calling the
chef-splunk::client
recipe directly.Actual Behaviour
On Ubuntu 16.04, 18.04 and 20.04 (using the latest images via the ec2-driver) my Kitchen Test in my wrapper cookbook fails to converge.
Below is the error output from Ubuntu 20.04.
Error output
```none Recipe: chef-splunk::client * execute[/opt/splunkforwarder/bin/splunk stop] action run (skipped due to not_if) Recipe: chef-splunk::service * service[splunk] action start ================================================================================ Error executing action `start` on resource 'service[splunk]' ================================================================================ Mixlib::ShellOut::ShellCommandFailed ------------------------------------ Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 Resource Declaration: --------------------- # In /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb 116: service 'splunk' do 117: action node['init_package'] == 'systemd' ? %i(start enable) : :start 118: supports status: true, restart: true 119: notifies :run, "execute[#{splunk_cmd} stop]", :before unless correct_runas_user? 120: provider splunk_service_provider 121: end 122: Compiled Resource: ------------------ # Declared in /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb:116:in `from_file' service("splunk") do provider Chef::Provider::Service::Systemd action [:start, :enable] updated true default_guard_interpreter :default service_name "splunk" enabled true running true masked false pattern "splunk" declared_type :service cookbook_name "chef-splunk" recipe_name "service" supports {:status=>true, :restart=>true} end System Info: ------------ chef_version=14.15.6 platform=ubuntu platform_version=20.04 ruby=ruby 2.5.8p224 (2020-03-31 revision 67882) [x86_64-linux] program_name=/opt/chef/bin/chef-client executable=/opt/chef/bin/chef-client Running handlers: [2020-10-15T09:42:36+00:00] ERROR: Running exception handlers Running handlers complete [2020-10-15T09:42:36+00:00] ERROR: Exception handlers complete Chef Client failed. 3 resources updated in 01 seconds [2020-10-15T09:42:36+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out [2020-10-15T09:42:36+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report [2020-10-15T09:42:36+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 >>>>>> ------Exception------- >>>>>> Class: Kitchen::ActionFailed >>>>>> Message: 1 actions failed. >>>>>> Converge failed on instanceFurther investigation reveals:
systemctl status splunk.service
```none ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: failed (Result: protocol) since Thu 2020-10-15 09:42:36 UTC; 1h 15min ago Process: 3603 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS) Oct 15 09:42:35 ip-10-0-0-47 systemd[1]: Starting Splunk... Oct 15 09:42:36 ip-10-0-0-47 splunk[3603]: The splunk daemon (splunkd) is already running. Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Failed with result 'protocol'. Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: Failed to start Splunk. ```Details
chef-splunk
cookbook version: 6.3.0Workaround
I have a manual workaround:
kitchen login
kill
the splunk process.service splunk start
This successfully launches the splunk service:
Shell output
```none ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk root 2974 1 0 09:31 ? 00:00:08 splunkd -p 8089 restart root 2975 2974 0 09:31 ? 00:00:00 [splunkd pid=2974] splunkd -p 8089 restart [process-runner] ubuntu 3859 3787 0 11:14 pts/0 00:00:00 grep --color=auto splunk ubuntu@ip-10-0-0-47:~$ sudo kill -9 2974 ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk ubuntu 3885 3787 0 11:14 pts/0 00:00:00 grep --color=auto splunk ubuntu@ip-10-0-0-47:~$ sudo service splunk start ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2020-10-15 11:14:37 UTC; 10s ago Process: 3895 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS) Main PID: 3966 (splunkd) Tasks: 40 (limit: 4710) Memory: 52.2M CGroup: /system.slice/splunk.service ├─3966 splunkd -p 8089 start └─3967 [splunkd pid=3966] splunkd -p 8089 start [process-runner] Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Checking default conf files for edits... Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-8.0.6-152fb4b2bb96-linux-2.6-x86_64-manifest' Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: All installed files intact. Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: All preliminary checks passed. Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Starting splunk server daemon (splunkd)... Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: Started Splunk. ```You will notice the
splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument
is still present even on a successful start.This leads me to be unsure whether the PID is the root error or a red herring in this case.
Stack trace
Stack trace
```none Generated at 2020-10-15 13:48:48 +0000 Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:124:in `shell_out!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service/systemd.rb:106:in `start_service' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:135:in `block in action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/why_run.rb:51:in `add_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:227:in `converge_by' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:134:in `action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:182:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource.rb:578:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:74:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `block in run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `each' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:132:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:85:in `step' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:130:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:720:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `catch' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:754:in `converge_and_save' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:286:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:303:in `run_with_graceful_exit_option' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:279:in `block in run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/local_mode.rb:44:in `with_server_connectivity' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:261:in `run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application/client.rb:449:in `run_application' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:66:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/bin/chef-client:25:in `