splunk / docker-splunk

Splunk Docker GitHub Repository
469 stars 257 forks source link

Running with only one mounted volume, `ERROR: Couldn't read "/data/splunk/etc/splunk-launch.conf"` #408

Closed henricook closed 4 years ago

henricook commented 4 years ago

I'm in a company/infrastructure setup where my only option is to use only one persistent volume for Splunk. My dockerfile and entrypoint script are below.

I'm trying to bring up the container, use an entrypoint script to cp everything from /opt/splunk to /data/splunk (my mount point) and then start the normal ansible process with amended SPLUNK_HOME. The plan seemed sound but I'm hitting some barriers, can anyone help?

Dockerfile:

FROM splunk/splunk:latest AS splunk

ENV SPLUNK_HOME=/data/splunk
ENV SPLUNK_START_ARGS="--accept-license"

# Setup JSON parsing for indices
COPY files/props.conf /opt/splunk/etc/system/local/props.conf

# Default config override. For all options run `docker run --rm -it splunk/splunk:latest create-defaults > default.yml` on any machine
COPY files/default.yml /tmp/defaults/default.yml

USER root
#
## Use this entrypoint script to copy the standard files onto the data volume at startup (if they don't already exist or have been updated)
COPY files/docker-entrypoint.sh /docker-entrypoint.sh

RUN mkdir -p /data/splunk && chown splunk:splunk /docker-entrypoint.sh && chown splunk:splunk /data/splunk && chmod 771 /data/splunk && chmod +x /docker-entrypoint.sh

# Solves the odd access problem or two (maybe temporary)
RUN usermod -a -G splunk ansible

USER ansible

EXPOSE 8001

ENTRYPOINT ["/docker-entrypoint.sh"]

docker-entrypoint.sh:

#!/bin/bash

cho "Copying splunk template files to /data/splunk/..."
# Make sure our PV is populated
cp -rup /opt/splunk/* /data/splunk/
echo "Copied splunk.template files to /data/splunk/."

# Standard splunk entrypoint
/sbin/entrypoint.sh start-service

Currently with docker build . -f build/Dockerfile -t foo && docker run -e SPLUNK_PASSWORD=h3ll0th3Ar3you foo ansible starts and runs about ten steps before failing with:

changed: [localhost]
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.142)       0:00:06.857 ****** 
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.042)       0:00:06.900 ****** 
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.042)       0:00:06.943 ****** 
included: /opt/ansible/roles/splunk_common/tasks/set_splunk_secret.yml for localhost
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.098)       0:00:07.041 ****** 
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.046)       0:00:07.088 ****** 
included: /opt/ansible/roles/splunk_common/tasks/set_user_seed.yml for localhost
Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.106)       0:00:07.194 ****** 

TASK [splunk_common : Hash the password] ***************************************
fatal: [localhost]: FAILED! => {
    "changed": false, 
    "cmd": [
        "/data/splunk/bin/splunk", 
        "hash-passwd", 
        "h3ll0th3Ar3you"
    ], 
    "delta": "0:00:00.361617", 
    "end": "2020-08-12 15:48:47.836496", 
    "rc": 8, 
    "start": "2020-08-12 15:48:47.474879"
}

STDOUT:

ERROR: Couldn't determine $SPLUNK_HOME or $SPLUNK_ETC; perhaps one should be set in environment

MSG:

non-zero return code

PLAY RECAP *********************************************************************
localhost                  : ok=25   changed=2    unreachable=0    failed=1    skipped=26   rescued=0    ignored=0   

Wednesday 12 August 2020  15:48:47 +0000 (0:00:00.475)       0:00:07.669 ****** 
=============================================================================== 
Gathering Facts --------------------------------------------------------- 1.58s
splunk_common : Update Splunk directory owner --------------------------- 1.19s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.65s
splunk_common : Hash the password --------------------------------------- 0.48s
splunk_common : Find manifests ------------------------------------------ 0.31s
splunk_common : Check for existing installation ------------------------- 0.28s
splunk_common : include_tasks ------------------------------------------- 0.17s
Provision role ---------------------------------------------------------- 0.16s
splunk_common : Create .ui_login ---------------------------------------- 0.14s
splunk_common : Check if /sbin/updateetc.sh exists ---------------------- 0.14s
splunk_common : Check for existing splunk secret ------------------------ 0.13s
splunk_common : include_tasks ------------------------------------------- 0.11s
splunk_common : include_tasks ------------------------------------------- 0.10s
splunk_common : Detect service name ------------------------------------- 0.10s
splunk_common : include_tasks ------------------------------------------- 0.10s
splunk_common : include_tasks ------------------------------------------- 0.10s
splunk_common : include_tasks ------------------------------------------- 0.10s
splunk_common : Set target version fact --------------------------------- 0.09s
splunk_common : Set splunk_build_type fact ------------------------------ 0.09s
Execute pre-setup playbooks --------------------------------------------- 0.05s
ERROR: Couldn't read "/data/splunk/etc/splunk-launch.conf" -- maybe $SPLUNK_HOME or $SPLUNK_ETC is set wrong?
nwang92 commented 4 years ago

Interesting use case! Before building your own image, maybe it's possible to get away with dropping the persistent storage on one of the mounts? If it's of any use:

If I had to choose I would elect to persist only /opt/splunk/var. You can most likely get away with dropping persistent storage for /opt/splunk/etc but it depends on the topology - if you're using indexer cluster, search head cluster, just forwarders or standalones, etc.

Most of the env vars that are available in the splunk/splunk image should allow you to tweak Splunk so at boot, it's set up in a repeatable manner. However, if you have users use Splunk and change things via the UI (creating new dashboards, for instance) those get stored somewhere in /opt/splunk/etc and would be lost upon container restart.

henricook commented 4 years ago

Thanks @nwang92 - it's a trade-off I'd considered, we're going to be setting up many custom dashboards though so it's something we'd like to keep!

In the end I've spent a day upskilling on what makes our infrastructure tick and then made a few minor tweaks that seem to have allowed me to attach 2xVolumes to my pod - so I've managed to remove the one volume requirement at our infrastructure level with some sweat and tears.

It'd be so much more flexible if Splunk could support this though, do you think? Even just moving etc and var under /opt/splunk/data or something would let people mount just the data directory. I don't know how that migration path might work though, what do you think of it in terms of a feature suggestion? :-D

nwang92 commented 4 years ago

I think I see what's preventing this from working - there's an /opt/splunk-etc which stores the etc dir for the specific Splunk version used in the image itself. When $SPLUNK_HOME changes here, it messes this bit up - I'll see if I can fix it.

Thinking out loud, but I wonder if you could just add a PVC to /opt/splunk? I recall there being some Docker issues where volumes at specific paths might get wiped when doing this, but I'm not sure if that was ever fixed or if you can even work around it in k8s with things like initContainers.

Additionally, this might be more effort, but maybe even dynamically fetching a Splunk build would allow you to add the PVC to /opt/splunk and circumvent the existing filesystem contents within /opt/splunk. Something like:

env:
- name: SPLUNK_BUILD_URL
  value: https://download.splunk.com/products/splunk/releases/8.0.5/linux/splunk-8.0.5-a1a6394cc5ae-Linux-x86_64.tgz

I haven't had time to personally test these, but just throwing some ideas out :) I think moving etc and var is a bit of a larger, product change though. That would require some changes to the Splunk distribution, whereas this container is more-or-less just a packaging mechanism for those binaries.

henricook commented 4 years ago

Yeah I think that's the problem, when I add a PVC to /opt/splunk it wipes the whole folder (because a new blank volume is now mounted there)

Dynamically fetching a splunk build is :exploding_head: - could work. Those /opt/splunk-etc changes sound :muscle:

nwang92 commented 4 years ago

Should be available in the nightly splunk/splunk:edge image - otherwise, should be fixed in 8.0.6 image tag release.

henricook commented 4 years ago

Amazing, thanks @nwang92 !