splunk / splunk-operator

Splunk Operator for Kubernetes
Other
202 stars 112 forks source link

Splunk Operator: There is no way to provide custom default setiings to search heads and deployer. #1048

Open yaroslav-nakonechnikov opened 1 year ago

yaroslav-nakonechnikov commented 1 year ago

Please select the type of request

Bug

Tell us more

Describe the request At the moment definition to create SearchHeads is written in single CRD, which creates 4 pods at minimum:

we are using PingID integration, which requires to define fqdn setting, where browser redirects after successful login. By default, hostname is being used, which is not accessible from client PC's.

and if we provide defaults.yml with correct setup of PindID config with fqdn - here problem arises: only one domain name is defined.

Expected behavior There should be way to define default config for searchhead deployer and custom config for searchhead nodes.

Splunk setup on K8S EKS on AWS

Reproduction/Testing steps with next defaults.yml submited to searchheads crd - only one fqdn is possible to use.

splunk:
  app_paths_install:
    default:
      - https://proxy/artifactory/prj-raw-host/splunk/splunk-apps/config-explorer_1715.tgz
      - https://proxy/artifactory/prj-raw-host/splunk/splunk-apps/splunk-datasets-add-on_10.tgz
      - https://proxy/artifactory/prj-raw-host/splunk/splunk-apps/splunk-app-for-lookup-file-editing_360.tgz
  conf:
    - key: deploymentclient
      value:
        directory: /opt/splunk/etc/system/local
        content:
          deployment-client :
            disabled : false
          target-broker:deploymentServer :
            targetUri : deployment-server.prj-eks-dev.cmp-prj-dev.internal.cmpgroup.cloud:8089
     - key: authentication
       value:
         directory: /opt/splunk/etc/system/local
         content:
           authentication:
             authSettings : saml
             authType : SAML
           saml:
             entityId : splunkACSEntityId
             fqdn : https://shc-deployer.26981.cmp-prj-dev.internal.cmpgroup.cloud
             idpSSOUrl : https://idp.host.com/idp/SSO.saml2
             inboundDigestMethod : SHA1;SHA256;SHA384;SHA512
             inboundSignatureAlgorithm : RSA-SHA1;RSA-SHA256;RSA-SHA384;RSA-SHA512
             issuerId : idp:host.com:saml2
             lockRoleToFullDN : true
             redirectAfterLogoutToUrl : https://www.splunk.com
             redirectPort : 443
             replicateCertificates : true
             signAuthnRequest : true
             signatureAlgorithm : RSA-SHA1
             signedAssertion : true
             sloBinding : HTTP-POST
             ssoBinding : HTTP-POST
             clientCert : /mnt/certs/saml_sig.pem
             idpCertPath: /mnt/certs/
           roleMap_SAML:
             admin : cmp-aws-s-eng-admin;aws-s-eng-admin

So, problem, that we need to know how to provide: fqdn : https://shc-deployer.26981.cmp-prj-dev.internal.cmpgroup.cloud - for deployer fqdn : https://shc.26981.cmp-prj-dev.internal.cmpgroup.cloud - for searcheads

yaroslav-nakonechnikov commented 1 year ago

according to documentation: https://splunk.github.io/docker-splunk/ADVANCED.html#:~:text=The%20purpose%20of%20the%20default.yml%20is%20to%20define,members%20of%20the%20cluster%20%28ex.%20keys%2C%20passwords%2C%20secrets%29. : there is nice example: password: "{{ splunk_password | default(<password>) }}" so i thought some jinja functions should work...

and trying to do something like: fqdn : "{% if getenv("SPLUNK_ROLE") == "splunk_search_head" %}https://shc.${splunk_domain}{% else %}https://shc-deployer.${splunk_domain}{% endif %}" and it is not working, because of:

yaml.scanner.ScannerError: while scanning for the next token found character '%' that cannot start any token   in "<unicode string>", line 27, column 21:
fqdn : {% if getenv("SPLUNK_ROLE") == "s ...                          ^
[WARNING]:  * Failed to parse /opt/ansible/inventory/environ.py with ini plugin: /opt/ansible/inventory/environ.py:16: Expected key=value host variable assignment, got: __future__
[WARNING]: Unable to parse /opt/ansible/inventory/environ.py as an inventory source
yaroslav-nakonechnikov commented 1 year ago

i found workaround:

fqdn : >
              {% if lookup('ansible.builtin.env', 'SPLUNK_ROLE') == "splunk_search_head" %}
                https://she.${splunk_domain}
              {% else %}
                https://she-deployer.${splunk_domain}
              {% endif %}

but, i would like to know if there any better official way to do that

akondur commented 1 year ago

CSPL-2152

yaroslav-nakonechnikov commented 4 months ago

this one becomes critical.

real case: when there is a list of apps to be installed, deployer requires a lot of time to make pod in Running state. and defining StartupProbe with big timeout - affects also searchead nodes, which leads that sh nodes can't get IP assigned, till startup probe will start to work.

if increase threshold - it will lead to another issue, that real issue won't be detected fast enough.

yaroslav-nakonechnikov commented 4 months ago

related: https://github.com/splunk/splunk-ansible/issues/784

yaroslav-nakonechnikov commented 4 months ago

and another thing found.

when deployer starts, it connects to deployment server and download apps. In that time nodes are passing further and then deployer stucks on:

TASK [splunk_deployer : Wait for SHC to be ready] ******************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Exception: SHC failure, setup notcomplete. online_peers:['05BB34E3-9B8F-4916-A60A-493D4534047F', 'B7CE74CB-138D-4E4F-9A6C-B4DB791C155D']
fatal: [localhost]: FAILED! => {
    "attempts": 60,
    "changed": false,
    "rc": 1
}

MSG:

MODULE FAILURE
See stdout/stderr for the exact error

MODULE_STDERR:

Traceback (most recent call last):
  File "/home/splunk/.ansible/tmp/ansible-tmp-1709656969.8714278-4953-235691734405253/AnsiballZ_shc_ready.py", line 100, in <module>
    _ansiballz_main()
  File "/home/splunk/.ansible/tmp/ansible-tmp-1709656969.8714278-4953-235691734405253/AnsiballZ_shc_ready.py", line 92, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File "/home/splunk/.ansible/tmp/ansible-tmp-1709656969.8714278-4953-235691734405253/AnsiballZ_shc_ready.py", line 41, in invoke_module
    run_name='__main__', alter_sys=True)
  File "/usr/lib/python3.7/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/ansible_shc_ready_payload_nh5z9sh5/ansible_shc_ready_payload.zip/ansible/modules/shc_ready.py", line 55, in <module>
  File "/tmp/ansible_shc_ready_payload_nh5z9sh5/ansible_shc_ready_payload.zip/ansible/modules/shc_ready.py", line 50, in main
  File "/tmp/ansible_shc_ready_payload_nh5z9sh5/ansible_shc_ready_payload.zip/ansible/modules/shc_ready.py", line 37, in run
Exception: SHC failure, setup not complete. online_peers:['05BB34E3-9B8F-4916-A60A-493D4534047F', 'B7CE74CB-138D-4E4F-9A6C-B4DB791C155D']

PLAY RECAP *********************************************************************
localhost                  : ok=137  changed=20   unreachable=0    failed=1    skipped=64   rescued=0    ignored=0

executing splunk resync shcluster-replicated-config manually on deployer allows to pass this check.