sous-chefs / chef-splunk

Development repository for the chef-splunk cookbook
https://supermarket.chef.io/cookbooks/chef-splunk
Apache License 2.0
75 stars 122 forks source link

splunk.service: Refusing to accept PID outside of service control group #185

Closed JonoRicci closed 4 years ago

JonoRicci commented 4 years ago

Myself and @jjm have encountered the following problem and would be very grateful for any assistance.

Expected Behaviour

I want to install the splunk universal forwarder in my AWS EC2 environment.

I am using a wrapper cookbook which only determines the host OS and passes through a private appropriate installation URL to the chef-splunk cookbook. In my wrapper cookbook I am calling the chef-splunk::client recipe directly.

Actual Behaviour

On Ubuntu 16.04, 18.04 and 20.04 (using the latest images via the ec2-driver) my Kitchen Test in my wrapper cookbook fails to converge.

Below is the error output from Ubuntu 20.04.

Error output ```none Recipe: chef-splunk::client * execute[/opt/splunkforwarder/bin/splunk stop] action run (skipped due to not_if) Recipe: chef-splunk::service * service[splunk] action start ================================================================================ Error executing action `start` on resource 'service[splunk]' ================================================================================ Mixlib::ShellOut::ShellCommandFailed ------------------------------------ Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 Resource Declaration: --------------------- # In /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb 116: service 'splunk' do 117: action node['init_package'] == 'systemd' ? %i(start enable) : :start 118: supports status: true, restart: true 119: notifies :run, "execute[#{splunk_cmd} stop]", :before unless correct_runas_user? 120: provider splunk_service_provider 121: end 122: Compiled Resource: ------------------ # Declared in /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb:116:in `from_file' service("splunk") do provider Chef::Provider::Service::Systemd action [:start, :enable] updated true default_guard_interpreter :default service_name "splunk" enabled true running true masked false pattern "splunk" declared_type :service cookbook_name "chef-splunk" recipe_name "service" supports {:status=>true, :restart=>true} end System Info: ------------ chef_version=14.15.6 platform=ubuntu platform_version=20.04 ruby=ruby 2.5.8p224 (2020-03-31 revision 67882) [x86_64-linux] program_name=/opt/chef/bin/chef-client executable=/opt/chef/bin/chef-client Running handlers: [2020-10-15T09:42:36+00:00] ERROR: Running exception handlers Running handlers complete [2020-10-15T09:42:36+00:00] ERROR: Exception handlers complete Chef Client failed. 3 resources updated in 01 seconds [2020-10-15T09:42:36+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out [2020-10-15T09:42:36+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report [2020-10-15T09:42:36+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 >>>>>> ------Exception------- >>>>>> Class: Kitchen::ActionFailed >>>>>> Message: 1 actions failed. >>>>>> Converge failed on instance . Please see .kitchen/logs/client-ubuntu-2004.log for more details >>>>>> ---------------------- >>>>>> Please see .kitchen/logs/kitchen.log for more details >>>>>> Also try running `kitchen diagnose --all` for configuration ```

Further investigation reveals:

systemctl status splunk.service ```none ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: failed (Result: protocol) since Thu 2020-10-15 09:42:36 UTC; 1h 15min ago Process: 3603 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS) Oct 15 09:42:35 ip-10-0-0-47 systemd[1]: Starting Splunk... Oct 15 09:42:36 ip-10-0-0-47 splunk[3603]: The splunk daemon (splunkd) is already running. Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Failed with result 'protocol'. Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: Failed to start Splunk. ```

Details

Workaround

I have a manual workaround:

  1. kitchen login
  2. kill the splunk process.
  3. service splunk start

This successfully launches the splunk service:

Shell output ```none ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk root 2974 1 0 09:31 ? 00:00:08 splunkd -p 8089 restart root 2975 2974 0 09:31 ? 00:00:00 [splunkd pid=2974] splunkd -p 8089 restart [process-runner] ubuntu 3859 3787 0 11:14 pts/0 00:00:00 grep --color=auto splunk ubuntu@ip-10-0-0-47:~$ sudo kill -9 2974 ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk ubuntu 3885 3787 0 11:14 pts/0 00:00:00 grep --color=auto splunk ubuntu@ip-10-0-0-47:~$ sudo service splunk start ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2020-10-15 11:14:37 UTC; 10s ago Process: 3895 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS) Main PID: 3966 (splunkd) Tasks: 40 (limit: 4710) Memory: 52.2M CGroup: /system.slice/splunk.service ├─3966 splunkd -p 8089 start └─3967 [splunkd pid=3966] splunkd -p 8089 start [process-runner] Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Checking default conf files for edits... Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-8.0.6-152fb4b2bb96-linux-2.6-x86_64-manifest' Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: All installed files intact. Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: All preliminary checks passed. Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Starting splunk server daemon (splunkd)... Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: Started Splunk. ```

You will notice the splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument is still present even on a successful start.

This leads me to be unsure whether the PID is the root error or a red herring in this case.

Stack trace

[2020-10-15T09:42:36+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out
[2020-10-15T09:42:36+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
Stack trace ```none Generated at 2020-10-15 13:48:48 +0000 Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:124:in `shell_out!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service/systemd.rb:106:in `start_service' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:135:in `block in action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/why_run.rb:51:in `add_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:227:in `converge_by' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:134:in `action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:182:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource.rb:578:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:74:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `block in run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `each' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:132:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:85:in `step' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:130:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:720:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `catch' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:754:in `converge_and_save' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:286:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:303:in `run_with_graceful_exit_option' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:279:in `block in run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/local_mode.rb:44:in `with_server_connectivity' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:261:in `run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application/client.rb:449:in `run_application' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:66:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/bin/chef-client:25:in `' /opt/chef/bin/chef-client:81:in `load' /opt/chef/bin/chef-client:81:in `
' >>>> Caused by Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /usr/bin/systemctl --system start splunk ---- STDOUT: STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ---- End output of /usr/bin/systemctl --system start splunk ---- Ran /usr/bin/systemctl --system start splunk returned 1 /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:124:in `shell_out!' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service/systemd.rb:106:in `start_service' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:135:in `block in action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/why_run.rb:51:in `add_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:227:in `converge_by' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:134:in `action_start' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:182:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource.rb:578:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:74:in `run_action' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `block in run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `each' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `run_all_actions' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:132:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:85:in `step' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:130:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:720:in `block in converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `catch' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `converge' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:754:in `converge_and_save' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:286:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:303:in `run_with_graceful_exit_option' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:279:in `block in run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/local_mode.rb:44:in `with_server_connectivity' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:261:in `run_chef_client' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application/client.rb:449:in `run_application' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:66:in `run' /opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/bin/chef-client:25:in `' /opt/chef/bin/chef-client:81:in `load' ```
JonoRicci commented 4 years ago

Reproducing the error with Chef-Splunk Kitchen test with EC2-driver

I can run the chef-splunk kitchen test with dokken successfully without reproducing the error.

If I swap the dokken driver with the ec2 driver and add a very simple Inspec test I can reproduce our error in the chef-splunk cookbook. (I ran the Inspec test in the dokken driver which resulted with the same outcome).

The Inspec test:

describe service('splunk') do
  it { should be_installed }
  it { should be_enabled }
  it { should be_running }
end

The result:

  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ×  is expected to be running
     expected that `Service splunk` is running

Investigating on the instance:

Shell output ```none ubuntu@ip-10-0-0-150:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: inactive (dead) ubuntu@ip-10-0-0-150:~$ sudo systemctl start splunk.service Job for splunk.service failed because the service did not take the steps required by its unit configuration. See "systemctl status splunk.service" and "journalctl -xe" for details. ubuntu@ip-10-0-0-150:~$ systemctl status splunk.service ● splunk.service - Splunk Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled) Active: failed (Result: protocol) since Thu 2020-10-15 13:33:28 UTC; 2s ago Process: 3266 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS) Oct 15 13:33:27 ip-10-0-0-150 systemd[1]: Starting Splunk... Oct 15 13:33:28 ip-10-0-0-150 splunk[3266]: The splunk daemon (splunkd) is already running. Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder> Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder> Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Failed with result 'protocol'. Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: Failed to start Splunk. ```

kitchen.yml

kitchen.yml ```none --- driver: name: ec2 region: eu-west-1 interface: public instance_type: t2.medium require_chef_omnibus: true subnet_filter: ... security_group_filter: ... tags: ... transport: max_threads: 5 connection_timeout: 10 connection_retries: 36 connection_retry_sleep: 10 max_wait_until_ready: 1200 provisioner: name: chef_zero log_level: auto product_name: chef product_version: 14 max_retries: 3 wait_for_retry: 90 retry_on_exit_code: - 35 # chef-client's reboot scheduled exit status chef_license: accept attributes: dev_mode: true splunk: accept_license: true enable_ssl: false ssl_options: enableSplunkWebSSL: 0 httpport: 8000 startwebserver: 1 web_port: 8000 verifier: name: inspec sudo: true root_path: '/opt/verifier' platforms: - name: ubuntu-2004 driver: image_search: owner-id: "099720109477" name: ubuntu/images/*/ubuntu-*-20.04* - name: ubuntu-1804 driver: image_search: owner-id: "099720109477" name: ubuntu/images/*/ubuntu-*-18.04* - name: ubuntu-1604 driver: image_search: owner-id: "099720109477" name: ubuntu/images/*/ubuntu-*-16.04* suites: - name: client run_list: - recipe[chef-splunk::default] attributes: dev_mode: true splunk: accept_license: true verifier: inspec_tests: - path: test/integration/default ```
haidangwa commented 4 years ago

@JonoRicci can you show what your systemd unit file looks like? If you're calling the client recipe directly, you may not be setting up the splunk auth attributes. There is logic in the default recipe that reads the splunk admin user/pass from a data bag or from chef-vault.

haidangwa commented 4 years ago

you need to have this in an encrypted data bag or chef-vault item:

vault_item = chef_vault_item(node['splunk']['data_bag'], "splunk_#{node.chef_environment}")
jjm commented 4 years ago

Hi @haidangwa, I've created PR #186 that adds some inspec tests to chef-splunk for the client suite that shows the issue we are seeing without our wrapper cookbook.

The output of verify on ubuntu-2004 is as follows:

  System Package splunkforwarder
     ✔  should be installed
  Service splunk
     ✔  should be installed
     ✔  should be enabled
     ×  should be running
     expected that `Service splunk` is running
  Port 8089
     ✔  should be listening
     ✔  protocols should include "tcp"
  Processes splunkd
     ✔  should exist

Test Summary: 6 successful, 1 failure, 0 skipped

To me the cause seems with the starting of splunk to accept the license, if I login to the docker container and stop splunk with /opt/splunkforwarder/bin/splunk stop and service splunk start. All the tests pass

  System Package splunkforwarder
     ✔  should be installed
  Service splunk
     ✔  should be installed
     ✔  should be enabled
     ✔  should be running
  Port 8089
     ✔  should be listening
     ✔  protocols should include "tcp"
  Processes splunkd
     ✔  should exist

Test Summary: 7 successful, 0 failures, 0 skipped

Edit: Made it clearer we see these issues directly with chef-splunk and added summary to the verify output.

haidangwa commented 4 years ago

@jjm Have you accepted the license? There is one way and only one way to accept the license: https://github.com/chef-cookbooks/chef-splunk#license-acceptance

jjm commented 4 years ago

@haidangwa Yes, it's done by this line of the kitchen.yml file:

https://github.com/chef-cookbooks/chef-splunk/blob/98a95a26472f8e04cfef207bc50276154f068d71/kitchen.yml#L18

EDIT: Linked to chef license acceptance, not splunk.

jjm commented 4 years ago

The failure mode can be seem at https://github.com/chef-cookbooks/chef-splunk/pull/186/checks?check_run_id=1260401551 too.

ehvidal commented 4 years ago

Happening the same here.

In my case, kitchen converge completes without an error:

Recipe: chef-splunk::service
  * service[splunk] action restart
    - restart service service[splunk]

Running handlers:
Running handlers complete
Chef Infra Client finished, 22/44 resources updated in 36 seconds

but after that if I run kitchen verify:

  System Package splunkforwarder
     ✔  is expected to be installed
  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ×  is expected to be running
     expected that `Service splunk` is running

Test Summary: 3 successful, 1 failure, 0 skipped

The interesting thing is what ps -aux shows me:

root@default-ubuntu-1804:/# ps -aux | grep splunk
root         839  1.6  1.0 294276 80752 ?        Sl   20:28   0:00 splunkd -p 8089 restart
root         840  0.0  0.1  87852 13584 ?        Ss   20:28   0:00 [splunkd pid=839] splunkd -p 8089 restart [process-runner]
root         965  0.0  0.0  11460  1028 pts/0    S+   20:29   0:00 grep --color=auto splunk

it seems to me that the problem is in the restart of the service. If I kill those processes and I converge again, then everything is fine:

  System Package splunkforwarder
     ✔  is expected to be installed
  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ✔  is expected to be running

Test Summary: 4 successful, 0 failures, 0 skipped