influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.9k stars 5.6k forks source link

[Feature Request] Add sumarize data by interval in the log parser input plugin #1478

Closed toni-moreno closed 8 years ago

toni-moreno commented 8 years ago

Directions

On big infraestructures we would like to store and process only needed data. Suppose a cluster of apache servers , with big load and we need only the number of hits/interval and response time processed from their access log.

Suppose our servers are proccessing up than 3millons of hits/hour ( on 10 servers) and only need 3 metrics ( hits and average,max,p90 response time)

So we would only store 3x10x60 = 180 metrics / hour instead of 3 millions of inserts with a lot of unneeded data.

We can just do this with collectd +Tail Plugin

https://collectd.org/wiki/index.php/Plugin:Tail

or collectd + apachelog plugin.

https://github.com/toni-moreno/collectd-apachelog-plugin

We can use telegraf and logparser as the base for this work, this could be interesting to get log processing also over windows systems.

Feature Request

We would like to have a config option for each file with switch behaviour from "all events sent" to "only summarized send", and also the kind of summarization , how to group data and what to send.

Proposal:

configuration could be something like that.

[[inputs.logparser]]
# files should be an array of "id"-"filename"
  files = [
        ["8080","/var/log/httpd/access8080.log"],
        ["80","/var/log/httpd/access80.log"],
        ["443","/var/log/httpd/access443.log]
   ]
  from_beginning = false

  [inputs.logparser.grok]
    custom_patterns = '''
   APACHE_LOG_WITH_RT %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
%{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
 %{INT:rt}
    '''
  #input.logparse.(id) 

  [inputs.logparser."8080"]
     send_all_events=true 
     #nothing more to config here.

  #input.logparse.(id) 
  [inputs.logparser."80"]
         send_all_events=false
         id_tag=log_port 
         #match.grok field.regex_filter 
         [match."rawrequest"."/some/url.*[a-Z]$"]
              #measurement where group data
               extratags=[ url="myurl" , othertag="valuetag"]
               measurement="http_stats"
               groupby_grok_field=response
               groupby_tag="httpcode"

               [field  "hits_x_interval"]
                      #summarize_type should be any of "counter,sum,max,min,avg,percentile(N)"
                      summarize_type="counter"
                      summarize_grok_field=any
               [field  "rt_avg"]
                      summarize_type="avg"
                      summarize_grok_field=rt
                [field  "rt_max"]
                      summarize_type="max"
                      summarize_grok_field=rt
                [field  "rt_p90"]
                      summarize_type="percentile(90)"
                      summarize_grok_field=rt

Desired behavior:

with this config we will get data : measurement [fields] tags, as follows

"http_stats" [ hits_x_interval,rt_avg,rt_max,rt_p90] http_code=XXXX,  log_port=80/443,  url="myurl" , othertag="valuetag"

What do you think about?

sparrc commented 8 years ago

We will be doing a generic solution for this for all plugins, not just the logparser. See #1419. You may also be able to do some of this already using the pass/drop filters?

toni-moreno commented 8 years ago

There is any planned date to release a beta for this generic solution ? I would like to help you test it.

sparrc commented 8 years ago

Not at the moment, but you can subscribe to https://github.com/influxdata/telegraf/issues/380 and get notified about any progress or status changes.