Feature: Allow tag used for planting to be configurable

warmfusion commented 8 years ago

Scenario

All events from all systems are coming through to a Forest plugin to push into ElasticSearch using the first two elements of the tag as an index.

The rest of the tag is application specific meta data used by the various members of various teams.

Example of a tag; live.productA.haproxy.access or staging.productA.nginx.error.

Problem

As each unique tag results in a newly planted connection to ElasticSearch, a considerable number of new connections are established even tho the configuration is identical and the connection can be reused.

Proposal

Present an additional argument to the matching part of the configuration which defines a 'grove' of similar trees such that even though there may be hundreds of unique trees (based on unique tags) they are grouped into common groves (based on this new config) such that they share a common connection to ES (in this example).

Perhaps something that'd let me do this:

<match **>
    @type forest
    grove ${tag_parts[0..2]}
    subtype elasticsearch
    <template>
        logstash_format true
        logstash_prefix ${tag_parts[0..2]}
        hosts elasticsearch.priv.example.com
    </template>
</match>

I'd then expect for events tagged into the system, new planted trees only exist for the grove, and not for each tag.

input tag	"grove"
live.product.haproxy.access	live.product.haproxy
live.product.application.serviceA.event.subkey	live.product.application
live.product.application.serviceB.event.otherkey	live.product.application

I believe the change would be to the @mapping hash, and more specifically around here

tagomoris commented 8 years ago

I can understand your problem, but in general, forest plugin cannot assure that grove configuration value has consistent unit for each plants with configured parameters. Misconfigured configuration might break behavior of output plugins. So that, i think forest plugin cannot provide such options.

On the other hand, Fluentd v0.14 plugin API will provide variable tag handling in native. It'll satisfy your requirement, i think.

warmfusion commented 8 years ago

While I appreciate the concern around misconfiguration of plugins, I'd argue that any sufficiently advance plugin has scope for breaking itself. :smiley:

I don't think i'll be able to use 0.14 for a while yet; still working on transitioning from Ruby 1.9.3 :cry:

The impact of inconsistent hash keys on the mapping makes sense, and it absolutely follows that the possibility of having one plant when the output needs multiple would be remarkably confusing as events may not be handled consistently. That being said, would my suggested implementation provide a solution to my stated problem?

I'm wondering if I need to try and implement the changes myself to suit my use case, at least till we can get to 0.14.

tagomoris commented 8 years ago

My answer for this proposal is - I have no motivation to write it by myself, but I'll consider to merge pull-request for this if that code is good enough. Thank you for detailed proposal.

macdjord commented 8 years ago

How about a simpler partial solution? Frequently, I write Forest configurations with no tag-specific content at all - i.e. I want a.* to do this, and b.c.\ to do that, but all the tags in each category are handled exactly the same. This is easy to check for - if a config never uses __TAG__, ${tag}, etc., then anything matching that <case> or <template> will have the same config, guaranteed. In that situation, you could just create a single tree for all matching tags.

macdjord commented 8 years ago

More complete solution: Make TAG == grove. That is, if you define a grove, then TAG (and ${tag}, ${tag_parts[X]}, etc.) only contain the parts of the tag that were matched in the grove name.

macdjord commented 8 years ago

Another approach: When planting a new tree, cache the config used to initialize it. Every time a new tag comes in, generate the tree config from the

tagomoris / fluent-plugin-forest