splunk / docker-splunk

Splunk Docker GitHub Repository
450 stars 245 forks source link

UF Crashes on Container Restart (9.2 and 9.1) #666

Open JoePJisc opened 2 months ago

JoePJisc commented 2 months ago

When running containers on 9.2.1 (78803f08aabb) or 9.1.4 (a414fc70250e), if the container is restarted it fails to start with the following error:

TASK [splunk_universal_forwarder : Setup global HEC] ***************************
fatal: [localhost]: FAILED! => {
    "changed": false
}

MSG:

POST/services/data/inputs/http/httpadmin********8089{'disabled': '0', 'enableSSL': '1', 'port': '8088', 'serverCert': '', 'sslPassword': ''}NoneNoneNone;;; AND excep_str: No Exception, failed with status code 404: {"text":"The requested URL was not found on this server.","code":404}

PLAY RECAP *********************************************************************
localhost                  : ok=67   changed=3    unreachable=0    failed=1    skipped=69   rescued=0    ignored=0

Thursday 18 April 2024  14:49:02 +0000 (0:00:00.588)       0:00:17.478 ********
===============================================================================
splunk_common : Start Splunk via CLI ------------------------------------ 1.59s
Gathering Facts --------------------------------------------------------- 0.95s
splunk_universal_forwarder : Setup global HEC --------------------------- 0.59s
splunk_common : Cleanup Splunk runtime files ---------------------------- 0.51s
splunk_common : Update Splunk directory owner --------------------------- 0.48s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.43s
splunk_common : Check for scloud ---------------------------------------- 0.41s
splunk_common : Set mgmt port ------------------------------------------- 0.40s
splunk_common : Find manifests ------------------------------------------ 0.38s
splunk_common : Check if UDS file exists -------------------------------- 0.32s
splunk_common : Configure to set Mgmt Mode as auto (Allows UDS) --------- 0.30s
splunk_common : Remove user-seed.conf ----------------------------------- 0.30s
splunk_common : Reset root CA ------------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Disable indexing on the current node -------------------- 0.29s
splunk_common : Ensure license path ------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Create .ui_login ---------------------------------------- 0.29s
splunk_common : Check if /sbin/updateetc.sh exists ---------------------- 0.29s
splunk_common : Enable splunktcp input ---------------------------------- 0.29s

9.0.9 (6315942c563f) appears unaffected.

Skypex commented 1 month ago

Hi @JoePJisc,

I assume this happens on fresh installed UFs - not on upgrades?

I had the same error and it turned out that this was caused by SPLUNK_HOME_OWNERSHIP_ENFORCEMENT - see SECURITY.md.

When you try to run newer UF as container user splunk there are a lot of warnings that its not working fine. However, these are just warnings so nothing really fails.

However, in this play the error turns into an problem: https://github.com/splunk/splunk-ansible/blob/develop/roles/splunk_common/tasks/enable_admin_auth.yml#L6

The initial splunk admin user setup processes stdout and here the warning results in a broken passwd file:

[splunk@splunk-uf-0 splunkforwarder]$ pwd
/opt/splunkforwarder
[splunk@splunk-uf-0 splunkforwarder]$ cat etc/passwd
:admin:Warning: Attempting to revert the SPLUNK_HOME ownership::administrator:admin:::19853

I fixed this by overwriting the play as following:

---
- name: Set admin access via seed
  when: first_run | bool
  block:
  - name: "Hash the password"
    command: "python -c 'import sys, crypt; print(crypt.crypt(sys.argv[1], crypt.mksalt(crypt.METHOD_SHA512)))' '{{ splunk.password }}'"
    register: hashed_pwd
    changed_when: hashed_pwd.rc == 0
    become: yes
    become_user: "{{ splunk.user }}"
    no_log: "{{ hide_password }}"

That solved it for me - maybe it helps you as well!

Anyway, the root cause for this are in end the issues with SPLUNK_HOME_OWNERSHIP_ENFORCEMENT and I will create an issues to address those.