sematext / logagent-js

Extensible log shipper with input/output plugins, buffering, parsing, data masking, and small memory/CPU footprint
https://sematext.com/logagent
Apache License 2.0
389 stars 79 forks source link

Config value: literal or function #135

Closed rucciva closed 4 years ago

rucciva commented 6 years ago

As a logstash user, one thing attract me the most to logagent is this statement:

It is like Filebeat and Logstash in one, without the JVM memory footprint.

I especially love that i can define inline filter in the form of javascript so i can achieve greater flexibility in manipulating the event, since it use javascript function which is far more popular and powerful than logstash syntax. So i would realy love to see whether i can replace my logstash instance with logagent.

But one thing i find lacking in logagent is the fact that logstash configuration can includes conditional, not only in the filter section. This is helpful especially in routing the output based on the data being processed, not just based only on the sourceName or limited to whatever the plugin want it to be.

Meanwhile In the case of logagent, i think the use of yaml file have potentials to be more powerful than logstash configuration, since yaml is more popular and the value of the yaml can includes function (moreover in javascript language)

So, I propose that logagent plugins or filter can receive configuration object that holds literal value or function. This way, each time logagent plugins or filters needs to process event, they need to produce a new config object that contain only literal value by iterating the config keys and check wether its value is a type of function or not. The producing of the new config value can be extracted to a separated module, so that the code changes are minor. The function signature could resemble the inline filter declaration without the eventemitter and callback object

For example:

input: 
  kafka: 
    module: logagent-input-kafka
    host: kafka-host
    port: 9092
    groupId: docker-logger-logagent
    topic: docker-filebeat-logs
    autoCommit: true
    sessionTimeout: 15000

output: 
  elasticsearch: 
    module: elasticsearch
    url: !!js/function >
       function (context, config, data) {
           if (data.docker.container.name == "container-a") {
                return "http://a.elasticsearch.local"
           } else {
                return "http://b.elasticsearch.local"
           }
       }
megastef commented 6 years ago

We support already funtions in the data._id field to generate dynamic index names. Could you provide a PR to support URL in Elasticsearch output as a function?

I think the best place for the change would be in https://github.com/sematext/logagent-js/blob/master/lib/plugins/output/elasticsearch.js in the function OutputElasticsearch.prototype.getLogger

Please note that we are working on a docker plugin with similar log routing functions line Sematext Docker Agent. So containers could have labels for log routing (url, index).

rucciva commented 6 years ago

Hi @megastef , thank you very much for the merge,

Anyway, is it okay to apply this mechanism not only in elasticsearch plugin, but also into the other output plugins and filter. I would happily create a pull request about this.

otisg commented 6 years ago

Ping @megastef ?

megastef commented 5 years ago

In general, I like the idea.

We had a little problem, where we put a function into the config variable. To make Elasticsearch output working again, I had to add this lines to config reducer to ignore the "tokenMapper" function: https://github.com/sematext/logagent-js/blob/master/lib/util/config-reducer.js#L6-L8

@rucciva could we start with a description, which describes how configs can be used and maybe we have to add the evaluation of functions in bin/logagent.js to avoid touching every plugin? In theory, every output-filter could manipulate the config object. So we could implement the config-reducer as output-filter-plugin. How does it sound? We also need to consider the performance impact for operations, which needs to run for every event.