sensu / sensu-ansible

An Ansible role to deploy a fully dynamic Sensu stack!
https://ansible-sensu.readthedocs.io
MIT License
126 stars 96 forks source link

RFC: Leveling up on the Sensu w/ Ansible magic #118

Closed jaredledvina closed 6 years ago

jaredledvina commented 6 years ago

Warning, what follows is a bit long winded as I have been putting this together for some time now. Please call out any/all issues, improves, complaints, critiques, etc that you have with anything I've stated below. I'd love to hear your thoughts on how the Sensu community can integrate more with the Ansible community!

Current Status:

Currently, the Sensu community is maintaining the sensu-ansible repo, which is a single role that allows for deploying the Sensu server, Sensu API, Sensu Client, Sensu Enterprise, RabbitMQ, and Redis.

Upstream, there's also a small collection of sensu-* native Ansible modules. I checked out the git history on these and it looks like a smattering of folks have added to them. However, they moved repo's and many of these were added in the old repo so, some of that history is lost.

Proposed Changes:

Regarding the Ansible Role

I think the sensu-ansible role is really solid for people who just need to get something working, however, because it is a monolithic role, it becomes painful to work with as your infrastructure expands. Adding the current, sensu-ansible role for all clients, feels weird and wrong to me and is out of line from standard Ansible roles.

To provide some context, my approach to Ansible roles is similar to the Unix philosophy. Each role should be modular and self-contained. It should be able to perform a single objective and perform it well with consistency. Every Ansible role should ship with sane defaults and all anyone to easily pick it up and deploy it. External role dependencies should be at an absolute minimum. Every Ansible role should be able to be tested automatically and pass any/all integration tests before being updated.

I'd like to propose the following:

Possible Ansible role end result

sensu-common
├── sensu-client
├── sensu-enterprise
└── sensu-server
sensu-uchiwa

Where sensu-common is the base role that configures all of the common Sensu setup steps (configures the proper repos, installs sensu, etc). sensu-client would specifically manage the client configuration, whether it should be enabled at boot, and such. sensu-uchiwa or possibly just uchiwa would manage configuring Uchiwa. Finally, sensu-enterprise would handle configuring Sensu Enterprise entirely. I've very unfamiliar with using Enterprise, so this might end up being the more difficult role to build out.

Before any of the above role changes even are considered though, we really need to configure the following:

The end state/goal I have in mind is that Sensu can provide official Ansible roles that are extremely modular and support a clean configuration/deployment story (i.e. clients just get the sensu-client role, servers the servers, etc) while still enabling users to get everything in one go if that's preferred.

Random Thoughts

If done properly, these roles would also be useful if Sensu wanted to create supported Docker images with ansible-container. I'm not personally sure where the direction of that is headed but, it's an option.

One initial problem I see happening is when the Sensu client is configured in Safe Mode. We'll need a way to share the check definitions with the sensu-server role and sensu-client. I guess a shared variable that's defined at a higher level (outside of the direct role variables) would work. Just something to keep in mind.

We should also standardize what operating systems (and versions), Sensu versions, and Ansible versions the Ansible work will support. Currently, it installs the latest release and verifies that. I'm not sure if the community has any standards here (is it all of 1.X? Maybe the last 3 releases?). As for Ansible versions, for sure the current stable and it would be really useful to try to test against what will become the next stable to catch any issues folks will hit in the next release. I'd be open to testing for the previous stable release of Ansible as well.

Finally, being able to deploy check scripts with a role structure like this might get tricky. I imagine that it would all happen in sensu-client but, if a user is using ansible-galaxy to install the roles, on upgrades, I'm fairly confident that it's going to blow away those changes. Currently, sensu-ansible uses a unique Dynamic Data Store setup. This setup doesn't work with ansible-pull, Ansible Tower, or Ansible AWX as it relies on a local directory not managed in the git repo (although I suppose you could manage it). I don't actually know what should happen here. If people really love the Dynamic Data Store setup, perhaps we keep with it. I'll personally look through what the rest of the Ansible community does to solve this type of problem as well.

Regarding the Ansible modules

As for the upstream modules, currently we have the following: http://docs.ansible.com/ansible/latest/sensu_check_module.html http://docs.ansible.com/ansible/latest/sensu_client_module.html http://docs.ansible.com/ansible/latest/sensu_handler_module.html http://docs.ansible.com/ansible/latest/sensu_silence_module.html http://docs.ansible.com/ansible/latest/sensu_subscription_module.html

I'd like to scope out reviewing each of these and validating that they expose all of the currently supported configuration options for each. I'm specifically worried that the sensu_check module doesn't. On a more meta level, I'd like to revisit how those function. In the end, Sensu wants a simple JSON file for the configuration and it feels wrong to me that the Ansible module is going to restrict the inputs separately. I'd personally, like to see the module take arbitrary inputs and dump out a JSON file that Sensu validates but, I would understand that we'd prefer more native parameter validation in Ansible itself.

There's also not a module for the API, enterprise, transport, and probably some other configurations I'm forgetting. Again, I think it's weird to structure these in this way and by doing so, they will always be out of date from upstream Sensu changes.

I'd also like to reach out to the Ansible folks and add the maintainers for the Sensu Ansible work to ansibullbot so that we are tagged automatically on all PR's open to those modules for review. We might also want to consider getting them updated from 'preview' to some more official support label ('community' might work here).

The overall end goal of all of this is to provide a solid foundation for anyone interesting it leveraging Ansible to deploy and manage their Sensu installation to do so while being flexible enough to support unique deployment restrictions. The Sensu community should feel confident that any changes they propose can be tested and implemented without regressions. If you've read my ramblings this far, awesome! I'd love to hear if anything here is way too crazy or off base. If you have any ideas on ways that the Sensu community can better integrate with the Ansible community, I'd absolutely love to hear it!

Forward-Looking Plan

The following is how I would imagine this would take place going forward.

  1. Add TravisCI tests to sensu-ansible https://github.com/sensu/sensu-ansible/issues/114
  2. Add syntax/linting checks to TravisCI & pre-commit hooks
  3. Enable automated Ansible Galaxy releases (also part of https://github.com/sensu/sensu-ansible/issues/114 )
  4. Test/Review/Merge/Knockback open issues and PR's against sensu-ansible
  5. Build out sensu-common
  6. Build out sensu-client, sensu-server, sensu-uchiwa
  7. Start swapping out sensu-ansible for roles from step 6
  8. Celebrate and party!
jaredledvina commented 6 years ago

Closing this out as it's served it's purpose. Will still be great to review in the future but, no reason to keep it open.