splunk / splunk-ansible

Ansible playbooks for configuring and managing Splunk Enterprise and Universal Forwarder deployments
357 stars 186 forks source link

revert wait_for_splunk_instance.yml #833

Closed jmeixensperger closed 6 months ago

jmeixensperger commented 6 months ago

Allows wait_for_splunk_instance.yml to target remote hosts instead of only localhost.

Using the previous uri module also always guarantees a response.status value.

jmeixensperger commented 6 months ago

Verified that this change works on an S3 SVA (3 search heads w/ shc, 1 indexer, 1 deployer)

jmeixensperger commented 6 months ago

@jonathan-vega-splunk I am still seeing the occasional error on deployers when the shc captain is not ready yet (see https://github.com/splunk/splunk-ansible/issues/824). This seems to occur when the shc captain is in the middle of restarting, however it is NOT caused by the wait_for_splunk_instance.yml error that this PR aims to resolve. We may want to take a look at increasing the retry window for deployers (the retry count and/or the retry delay).

hendolim commented 6 months ago

Can we keep using splunk_api to ensure UDS support. And just fix the url param to accommodate remote instance?

jmeixensperger commented 6 months ago

Can we keep using splunk_api to ensure UDS support. And just fix the url param to accommodate remote instance?

There are 2 major reasons I didn't use splunk_api:

  1. This task always targets a remote host, and it is impossible to target remote UDS endpoints. This task is also never called against forwarders.
  2. When targeting remote hosts, splunk_api can break the retry functionality of tasks with statements like changed when: response.status_code == 200 or until: response.status_code == 200. If the target host is unreachable, the first attempt/response does not have a status_code, triggers an ansible failure, and prevents all further retries.
hendolim commented 6 months ago

Can we keep using splunk_api to ensure UDS support. And just fix the url param to accommodate remote instance?

There are 2 major reasons I didn't use splunk_api:

  1. This task always targets a remote host, and it is impossible to target remote UDS endpoints. This task is also never called against forwarders.
  2. When targeting remote hosts, splunk_api can break the retry functionality of tasks with statements like changed when: response.status_code == 200 or until: response.status_code == 200. If the target host is unreachable, the first attempt/response does not have a status_code, triggers an ansible failure, and prevents all further retries.

Good callout on 1. I totally forgot about that. When I was debugging this, it appeared as if it is waiting for itself to come up rather than a remote instance, so I was under the impression that the task applies generically for both local and remote host. We might need to clarify the task name or put a comment