splunk / docker-splunk

Splunk Docker GitHub Repository
469 stars 257 forks source link

Splunk workload management (WLM) in a container #375

Open dadux opened 4 years ago

dadux commented 4 years ago

We'd like to migrate our Splunk cluster to using containers and the docker-splunk image, however the last remaining blocker that we've identified is that we cannot enable the workload management config in the container in recent version of Splunk (>= 7.3)

It fails the pre-flight checks :

Workload Management Preflight Checks failed. Fix the following issues:
    CPU Splunk base directory Splunkd.service requires read and write permissions.
    CPU Splunk base directory Splunkd.service is missing.
    The 'Delegate' property in the unit file must be set to 'true'. Restart Splunk then rerun preflight checks.
    In the unit file, the 'Restart' property must be set to 'always'. The 'ExecStart' property must include '_internal_launch_under_systemd'. Make sure the up-to-date unit file is loaded.
    Memory Splunk base directory Splunkd.service requires read and write permissions.
    Memory Splunk base directory Splunkd.service is missing.
    Unit file Splunkd.service is missing. Restart Splunk then rerun preflight checks.
bin/splunk version
Splunk 7.3.5 (build 86fd62efc3d7)

It appears to be looking for a systemd unit and the associated cgroups - which obviously doesn't exist in the container.

I understand this is not an issue with docker-splunk per say, but it would be nice to find a workaround as running no systemd unit is a common container behaviour.

dadux commented 4 years ago

It's worth mentioning that I was able to successfully run WLM in Splunk 7.2.10, in a container (and in Kubernetes) where there is no pre-flight checks.

So it appears to be a regression, and/or the documentation not up-to-date for more recent versions.

The following work for 7.2.10 :

There's a couple things required to get it work, and can be run as an ansible pre-task :

And also

Once that done, WLM is enabled successfully - and the cgroups created :

# $SPLUNK_HOME/bin/splunk show workload-management-status
    Workload Management Status:
        Enabled: 1
        Supported: 1
        Ingest Pool: pool_2
        Default Pool: pool_1
        Error:

    Workload Pools:
        pool_1:
            CPU Group: /sys/fs/cgroup/cpu/splunk/pool_1
            Memory Group: /sys/fs/cgroup/memory/splunk/pool_1
            CPU Weight: 20
            Memory Weight: 40

        pool_2:
            CPU Group: /sys/fs/cgroup/cpu/splunk/pool_2
            Memory Group: /sys/fs/cgroup/memory/splunk/pool_2
            CPU Weight: 80
            Memory Weight: 80
dadux commented 4 years ago

The documentation still mention configuring WLM for a non-systemd linux even for Splunk 8,

https://docs.splunk.com/Documentation/Splunk/8.0.4/Workloads/Configurenonsystemd

But I cannot get the pre-flight checks to pass.

bb03 commented 4 years ago

Hi @dadux, just wanted to say that we're looking into this and prioritizing the work. For the solution that you posted (for the cgroups pre-task), was that for 7.2.10 or was it for 8+

dadux commented 4 years ago

Hi @bb03 - the solution I posted is for 7.2.10.

I've also managed to create my own container for 8.0.4 and run Splunk under systemd (inside the container!). Once Splunk is under systemd, no problem to enable the WLM. 🎉

I'm creating a PR so you can have a look at the changes I did to make it work.