Open romain-bellanger opened 4 years ago
Is this understanding correct?
Sounds like you've got it figured out to me! Few notes:
inventory "randomizes" the order of this list
- if this is an issue, we can easily address it. Theoretically app install order shouldn't matter, as long as the final state has everything (I don't believe you can control order of apps pushed during deployer/cluster-master bundles?)apps located in etc/apps are then disabled
- this only applies to cluster masters and deployers. I believe this is intended to be aligned with Splunk best-practices, as these instances are more of "administrative" roles and shouldn't be doing any of the heavy-lifting that search heads/peers are for. We can certainly think of ways to open this up though.For your intended use cases:
It does not seem possible to use apps to configure the cluster master or the deployer themselves
- correct, that's how it is as it stands today. I can see your use case for LDAP though - I was ultimately thinking of exposing a separate parameter to auto-configure various auth settings, something by the way of:
splunk:
auth:
ldap:
...
saml:
...
which may fulfill this particular hole, but not necessarily the one where you'd like any arbitrary app installed.
The apps are installed and then disabled one by one
- correct, again only for deployer + cluster master. I wasn't aware that this disable task can take so long in those cases. I suppose the only thing to blame might be "bad" configs, although that might suggest that the apps are highly coupled and can't function independently? Switching this out to ordered procedures is an easy ask though.Uninstalling an app (e.g. for rollback) does not seem to be supported
- correct again, ideally we can declaratively designate apps such that if an app gets removed from splunk.apps_location
, it will get uninstalled, but that's currently not the case. If you're using persistent volumes, you do get some form of rollback, but if a new version introduces a new file and it gets rolled back into an old version, I don't believe Splunk is aware enough to remove that file on disk. Part of the reason we chose to leverage the Splunk REST API for app installation is the hope to improve these features going forward. The only alternative I can think of would be to "delete" the app, but that's not exactly a great rollback mechanism, nor is it safe as we don't want to incur any loss of local
directory configs.But overall, I like your proposals and I'll relay them around. I'm not opposed to the idea of some feature-flag that does wipe an app if it's no longer in the default.yml/environment variable, but obviously that comes with a few caveats we'd want to document :)
Hi @nwang92, many thanks for this feedback.
inventory "randomizes" the order of this list - if this is an issue, we can easily address it. Theoretically app install order shouldn't matter, as long as the final state has everything (I don't believe you can control order of apps pushed during deployer/cluster-master bundles?)
The randomization of the order of the apps does not matter for the bundle deployment, but this seems to matter for the calls to apps/local
REST API. It is not the behavior of the indexer cluster which changes, it's the behavior of the playbook execution, during the initial installation one by one of apps on the cluster master. I patched the docker images to change the set to a list, and I could make a the playbook succeed by placing the apps in a specific order, while it was failing otherwise. This was the case of LDAP setup from a cloud region (described in my initial comment), for which we use the same config as the clusters running in our DCs, but use app precedence to override the URL with a proxy/cache (main LDAP server behind a firewall), or provide encrypted credentials. The apps with precedence had to be installed first because of the API, while this doesn't matter from bundle perspective. Then disabling was done in the same order as install, and also took a lot of time on LDAP related app, I guess because proxy and credentials were disabled first, but I didn't attempt doing it in the opposite order.
My concern here not really that apps are extracted in a random order, this should be perfectly fine and I don't really want to worry about the order in which the apps are listed. The problem is that the order shouldn't impact on the behavior of the playbook or Splunk.
apps located in etc/apps are then disabled - this only applies to cluster masters and deployers. I believe this is intended to be aligned with Splunk best-practices, as these instances are more of "administrative" roles and shouldn't be doing any of the heavy-lifting that search heads/peers are for. We can certainly think of ways to open this up though.
For sure I don't want the apps for indexers to be enabled on the cluster-master! :-) Just the possibility to deploy different apps, specific to the cluster-master or deployer, which would not be disabled.
I can see your use case for LDAP though - I was ultimately thinking of exposing a separate parameter to auto-configure various auth settings
This would cover LDAP, but exposing settings one by one might not be as efficient as just opening app configuration. LDAP has a lot of settings, and access control, so role definition must also be covered. Some people might use SAML instead. I also mentioned health thresholds, timeouts, or bucket fixup settings which can be useful, and maybe some new features will be added in the future... A solution to load apps to cluster-master and deployer should cover all at once, including any future settings, and would probably not require much more effort to develop than exposing LDAP settings from ansible config.
The apps are installed and then disabled one by one - correct, again only for deployer + cluster master. I wasn't aware that this disable task can take so long in those cases. I suppose the only thing to blame might be "bad" configs, although that might suggest that the apps are highly coupled and can't function independently? Switching this out to ordered procedures is an easy ask though.
Some of our apps are highly coupled can't function independently. As explained, we reuse an app containing LDAP settings and access control common to multiple clusters, and then only override the URL to target a proxy, add bind credentials encrypted we the specific encryption key of the cluster, or add new roles on each specific cluster, in small specific apps. We also use this for index paths: the volumes and paths and default settings are defined in an app for indexers and an app for search heads (path mandatory even if nothing indexed), and the indexes are defined in a separate app loaded to both (for search autocompletion on searchheads) to avoid maintaining the list of indexes twice. So the app containing the list of index doesn't function without the volumes and path definitions. Preserving the order of apps could be a solution... but ideally, the order should not matter.
Part of the reason we chose to leverage the Splunk REST API for app installation is the hope to improve these features going forward
I understand, this is fine from my perspective, the problem is only validating apps individually (could be address with an API parameter to turn validation off, exposed from ansible config), and installation to etc/apps first which makes the process quite complex and could have undesired effect on the cluster master where the apps for indexers are temporarily installed and enabled (could be addressed with an API parameter specifying the installation path)
Hello,
While maintaining some clusters in Kubernetes using the alpha version of the splunk-operator, one of the main issues which are facing is related to the deployment of Splunk applications. I will open a dedicated ticket in the repository of the operator for problems specifically related to the operator, and focus here on the app installation solution provided by splunk-ansible.
Context
In our traditional environment, we have a lot of applications already maintained in git repositories, automatically validated, packaged to tarballs and placed into an artifact repository through a CI pipeline. This process is used for the internal configuration of our clusters (pipelines, indexes, inputs, authentication, access control) as well as user applications (e.g. dashboards and alerts).
Several of these applications were built with the help of Splunk PS, who generally recommended to put a small set of config files into apps with a meaningful names to ease the life of Splunk support. So our base config common to all clusters is already composed of a dozen of small apps / tarballs.
We are also trying to avoid duplicating configuration. So the same "base" apps are loaded to many cluster, and only the necessary settings are managed through cluster specific apps. Precedence and "composition" are used to achieve this. For instance our authentication / access control can be composed of 4 apps:
This type of solution is also used to define the volumes and default indexes paths and settings in a reusable app, separately from the specific indexes definition for a given cluster.
Our understanding of the solution
/opt/splunk/etc/apps
using the apps/local REST API. Installation through direct tarball extraction is only used for ITSI and is very basic (e.g. no file cleanup on new version)local
directory is not copied to the master-apps. However the complete app is disabled and this doesn't seem to be usable to configure the cluster master itselfIs this understanding correct?
Experienced limitations and issues
Is there already any solution available for these issues which I missed?
Proposals
Any other solution to address these issues would be very welcome, this is only to share some ideas...
Deploying apps to both the cluster-master and indexers, or both the deployer and search-heads, does not seem to be possible with the current configuration structure. A new property
apps_install_paths
could be defined in the ansible configuration with the following structure:Instead of using the
apps/local
API, the tarballs could be directly extracted to the target path. I think it is expected to not always have self-contained apps, and it seems to me that the install API either validates or activates them individually. In my perspective, the cluster-bundle validation (currently skipped) would be a better solution as it validates the apps altogether.Splunk PS recommended us to prefix all our apps with "ama_" to identify apps the apps which we install. In our traditional environment, our ansible playbooks only enforce the content of the app directories which have this prefix, meaning removing the app if not anymore part of the configuration, or cleaning up files not contained in the tarballs. This solution could be considered here, using the
apps_control_regex
property. The regex could also select the full name of some Splunk apps if needed.I understand that it could be expensive and confusing to maintain the two solutions...