splunk / splunk-ansible

Ansible playbooks for configuring and managing Splunk Enterprise and Universal Forwarder deployments
353 stars 185 forks source link

Upgrade fails from 9.1.1 to 9.2.1 #847

Closed yaroslav-nakonechnikov closed 2 months ago

yaroslav-nakonechnikov commented 3 months ago

Hello,

we are using splunk-operator, and noticed that upgrade is failing from 9.1.x to 9.2.x.

first issue comes in https://github.com/splunk/splunk-ansible/blob/develop/roles/splunk_common/tasks/set_as_hec_receiver.yml#L21:

TASK [splunk_indexer : Remove existing HEC token] ******************************
fatal: [localhost]: FAILED! => {
    "changed": false,
    "elapsed": 0,
    "redirected": false,
    "status": -1,
    "url": "[https://127.0.0.1:8089/services/data/inputs/http/splunk_hec_token"](https://127.0.0.1:8089/services/data/inputs/http/splunk_hec_token%22),
    "warnings": [
        "Module did not set no_log for password"
    ]
}

MSG:

Status code was -1 and not [200, 404]: Request failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1091)>

and i thought to wipe all old etc/system, as this looks something generic. But now it fails with:


TASK [splunk_common : Start Splunk via CLI] ************************************
fatal: [localhost]: FAILED! => {
    "attempts": 5,
    "changed": false,
    "cmd": [
        "/opt/splunk/bin/splunk",
        "start",
        "--accept-license",
        "--answer-yes",
        "--no-prompt"
    ],
    "delta": "0:00:00.369413",
    "end": "2024-06-20 10:47:29.519891",
    "rc": 2,
    "start": "2024-06-20 10:47:29.150478"
}

STDOUT:

This appears to be an upgrade of Splunk.
--------------------------------------------------------------------------------)

Splunk has detected an older version of Splunk installed on this machine. To
finish upgrading to the new version, Splunk's installer will automatically
update and alter your current configuration files. Deprecated configuration
files will be renamed with a .deprecated extension.

You can choose to preview the changes that will be made to your configuration
files before proceeding with the migration and upgrade:

If you want to migrate and upgrade without previewing the changes that will be
made to your existing configuration files, choose 'y'.
If you want to see what changes will be made before you proceed with the
upgrade, choose 'n'.

Perform migration and upgrade without previewing configuration changes? [y/n] y

Migrating to:
VERSION=9.2.1
BUILD=78803f08aabb
PRODUCT=splunk
PLATFORM=Linux-x86_64

STDERR:

-- Migration information is being logged to '/opt/splunk/var/log/splunk/migration.log.2024-06-20.10-47-29' --

An error occurred: Unable to generate distributed search keys.

MSG:

non-zero return code

PLAY RECAP *********************************************************************
localhost                  : ok=78   changed=3    unreachable=0    failed=1    skipped=54   rescued=0    ignored=0

Thursday 20 June 2024  10:47:29 +0000 (0:00:53.829)       0:01:20.378 *********
===============================================================================
splunk_common : Start Splunk via CLI ----------------------------------- 53.83s
splunk_common : Set options in roleMap_SAML ----------------------------- 5.01s
splunk_common : Set options in saml ------------------------------------- 4.48s
splunk_common : Get Splunk status --------------------------------------- 1.23s
Gathering Facts --------------------------------------------------------- 0.99s
splunk_common : Get Splunk status --------------------------------------- 0.66s
splunk_common : Get Splunk status --------------------------------------- 0.64s
splunk_common : Set options in authentication --------------------------- 0.56s
splunk_common : Cleanup Splunk runtime files ---------------------------- 0.56s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.39s
splunk_common : Find manifests ------------------------------------------ 0.37s
splunk_common : Set general pass4SymmKey -------------------------------- 0.36s
splunk_common : Create .ui_login ---------------------------------------- 0.36s
splunk_common : Check for scloud ---------------------------------------- 0.35s
splunk_common : Apply admin password ------------------------------------ 0.34s
splunk_common : Trigger restart ----------------------------------------- 0.33s
splunk_common : Enable Splunkd SSL -------------------------------------- 0.32s
splunk_common : Remove splunktcp-ssl input ------------------------------ 0.30s
splunk_common : Restrict permissions on /opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key --- 0.30s
splunk_common : Set mgmt port ------------------------------------------- 0.30s

issue is only on indexers atm

in the log: [yn@ip/]$ kubectl exec -it pod/splunk-site1-41678-indexer-0 -n splunk-operator -- cat /opt/splunk/var/log/splunk/migration.log.2024-06-20.10-47-29

Migrating to: VERSION=9.2.1 BUILD=78803f08aabb PRODUCT=splunk PLATFORM=Linux-x86_64

yaroslav-nakonechnikov commented 3 months ago

looks like wiping /opt/splunk/etc/* is solving that. But this is not expected at all.

upd. but it can't recreate /opt/splunk/etc/peer-apps, so i had to force it: mkdir -p /opt/splunk/etc/peer-apps

yaroslav-nakonechnikov commented 2 months ago

looks like patch2 fixed that.

but why there is no summary anymore?

yaroslav-nakonechnikov commented 2 months ago

yes, with patch2 all worked as expected.