voxpupuli / puppet-elasticsearch

Elasticsearch Puppet module
Apache License 2.0
403 stars 478 forks source link

Is there a way to create a document in ES with this module? #940

Open nick-george opened 6 years ago

nick-george commented 6 years ago

Feature Description

Hi,

This module already has very rich functionality, and for that I'm very grateful! Have you considered adding the ability to create a document in ES using this module? Does this feature already exist? I've had a look at the custom types, and can't see anything that would do this.

It would be awesome if we had a resource similar to the puppet "file" resource so we could drop documents in. I'm not super familiar with custom types and providers, but to implement this, could a new provider be written for the "file" resource?

Thanks for your time! Nick George

tylerjl commented 6 years ago

Some of the types and providers in this module like index settings, templates, and pipelines directly manipulate the Elasticsearch API to manage documents that can be represented as Puppet hashes. Whether this was a good idea at was somewhat as a gray area, as letting config management bleed over into manipulating a data store is a pretty unconventional use of Puppet. I think it's been generally useful though; as empowering the module to control all the settings that users may need and not just configuration files makes the module more complete from an end-user perspective.

I'm not sure about explicitly managing documents, though. I've sort of referenced the Puppetlabs MySQL module as a bit of a guiding example of what sorts of resources this module should handle, and though there are certainly parallels between the two right now (for example, the mysql module can manage a database schema in the same way this module can handle index settings), managing records/actual data is something kind of unusual and not something I've seen in it or any other modules that deal with data stores (mongo, postgres, etc.).

Perhaps some examples of a) what these resource would look like and b) what the use case is (i.e. what types of documents and what the intent of managing them via Puppet is) could help to clarify whether it'd be a good fit.

nick-george commented 6 years ago

Hi Tyler,

I'm probably teaching you to suck eggs here, but one of the cool features of ES is that it can be used to store things like configuration data. For example, Kibana's saved objects like dashboards, visualisations and searches are all stored in Elasticsearch. I'm also considering moving some configuration (for a custom Kibana plugin) that I currently store on disk into Elasticsearch. It would be awesome if I could just write this config data straight into ES using this module.

There are a few small advantages to being able to store "files" in ES using the Puppet module.

  1. It becomes possible configure software (like Kibana) completely, using Puppet. I understand that the Kibana module might one day have the ability to set dashboards etc via its module, and will probably do so using the Kibana API. Who knows, maybe they'd leverage this feature to implement the restoration of dashboards etc.
  2. Once indexed, the document is available across the entire cluster immediately, as opposed to waiting for the next puppet run to complete on any nodes that may be running ES (to drop a file).
  3. It seems to follow the ES philosophy of storing as much configuration as possible in ES itself, as opposed to on disk.
  4. It becomes possible to store any document in ES using this module. I'm not aware of all the use-cases for ES, but I'm sure some people (other than me) would find this useful :)

My understanding of MySQL isn't the best, but it's just a SQL compliant database right, and can't be used to store anything approximating JSON (https://stackoverflow.com/questions/3564024/storing-data-in-mysql-as-json)? This is one of the areas where Elastic is awesome, in that you can index a JSON document straight into ES. So I'm guessing that the official MySQL module hasn't implemented anything similar simply because it's not possible.

I hope all that makes sense, and thanks for your time! Nick

tylerjl commented 6 years ago

Hmm, I hadn't considered use cases like config-managing Kibana dashboards. In cases like that - and others as well, such as storing or configuring logstash pipelines via Puppet as well - the documents do tend to fit into the "configuration" bucket in addition to normal data documents.

I won't write off the idea entirely, since I think you're probably right, the custom types and providers this module has used thus far have been pretty successful (templates, pipelines, etc.) and potentially managing other resources probably has merit.

I'll label this appropriately and drop it into the backlog, so while it'd definitely be nice to have, I think it does fall behind some other items in the queue that are lagging that are more sorely needed to get the module more fully fleshed-out and feature complete (for example, the module still doesn't install/manage licenses, which is a common ask that the module should really handle). The plumbing to get to this feature is mostly laid out in the module at this point (we have templates, pipelines, indices, snapshot repositories, etc. that are all represented in the elasticsearch APIs) so the pieces are all there, it's primarily a matter of putting it together in the most sane way.

nick-george commented 6 years ago

Thanks Tyler,

If I were to implement this (in my own code first initially, but would try a PR if the quality was good enough), would you recommend writing a new provider for the file resource, or would it just be simpler to go with a new type called something like "elasticsearch_document"?

Also, once you put this into your backlog, feel free to close this issue.

tylerjl commented 6 years ago

The challenge with creating a new provider for the File type is that many of the properties of the type don't have any corollary with an Elasticsearch document such as owner, mode, et cetera (and these properties aren't gated behind a feature predicate per-provider, so a provider can't signal to a parent type "I don't support setting the mode parameter"). You can always fail-fast in the provider to ensure that consumers can only compile a catalog with parameters you support, but it's pretty trappy and I'm not sure what other oddities would crop up because of it.

For that reason it's probably simplest to start from a new type and provider entirely. Fortunately, the aforementioned resources from my previous comment (indices, snapshots, etc.) all behave slightly differently from each other which has a convenient by-product in that they illustrate how to tweak the underlying elastic_rest parent class to suit the behavior of the API in a way that makes sense for what you're trying to achieve. Perusing the type and provider code for all these native providers should actually provide a fairly significant jump-start to creating a new one, as both the types and providers already implement lots of tricks to make working with the API significantly easier.

As an example of this, consider fetching defining a document's json in Puppet and comparing it to the document you fetch from Elasticsearch. A GET against :9200/index/type/id will return some json with the actual body of the document in ._source, which you can express by defining metadata_pipeline for the provider's class in the same way elasticsearch_index does to pull out settings. Once you munge the data a bit to get it right, Puppet's built-in type/provider mechanisms will kick in for you to know when to update the document if it's out-of-sync, and so on.

nick-george commented 6 years ago

Thanks a lot for that, it gives me a good basis on which to start. If I make any headway, I'll submit a PR.

Cheers, Nick