moooofly / MarkSomethingDownLLS

本仓库用于记录自 2017年10月16日起,在英语流利说任职期间的各方面知识汇总(以下内容不足以体现全部,一些敏感内容已移除)~
MIT License
72 stars 37 forks source link

logstash 知识梳理 #82

Open moooofly opened 5 years ago

moooofly commented 5 years ago
moooofly commented 5 years ago

公司当前使用的 logstash 的 filter 配置

从公司使用的 10-json-filter.conf.j2 配置文件中看到,其使用了如下三种 filter plugin :

plugin list 详见 https://www.elastic.co/guide/en/logstash/current/filter-plugins.html


json

描述

This is a JSON parsing filter. It takes an existing field which contains JSON and expands it into an actual data structure within the Logstash event.

该 filter 针对一个包含 JSON 数据格式的 field ,将其展开为一个 Logstash event 内的数据结构;

By default it will place the parsed JSON in the root (top level) of the Logstash event, but this filter can be configured to place the JSON into any arbitrary event field, using the target configuration.

默认情况下,该 filter 会将解析后的 JSON 放到 root 级别(顶级)的 Logstash event 下;但允许通过 target 配置将其放到任意 event field 下;

This plugin has a few fallback scenario when something bad happen during the parsing of the event. If the JSON parsing fails on the data, the event will be untouched and it will be tagged with a _jsonparsefailure then you can use conditionals to clean the data. You can configured this tag with then tag_on_failure option.

解析 JSON 失败后的 fallback 处理

If the parsed data contains a @timestamp field, we will try to use it for the event’s @timestamp, if the parsing fails, the field will be renamed to _@timestamp and the event will be tagged with a _timestampparsefailure.

如果解析后的数据中包含 @timestamp ,则直接将其用作 event 的 @timestamp;如果解析后的数据中没有 @timestamp ,则使用 _@timestamp

Json Filter 配置选项

包含如下四个配置选项

source

The configuration for the JSON filter:

source => source_field

For example, if you have JSON data in the message field:

filter {
  json {
    source => "message"
  }
}

The above would parse the json from the message field

将 message field 中的内容解析为 json 数据结构

target

Define the target field for placing the parsed data. If this setting is omitted, the JSON data will be stored at the root (top level) of the event.

For example, if you want the data to be put in the doc field:

如下示例,将数据放在 doc field 下面:

filter {
  json {
    target => "doc"
  }
}

JSON in the value of the source field will be expanded into a data structure in the target field. -- 这里说明,如果不指定 source => "message" 则默认为 source => "source"

属于 source field 的 JSON value 会被扩展到 target field 中;

NOTE: if the target field already exists, it will be overwritten!

真实例子

通过

      json {
          source => "message"
          target => "http"
      }

message 的内容按照 JSON 解析后,挂在了 http field 下面

{
  "_index": "logstash-2019.02.25",
  "_type": "doc",
  "_id": "TRR-I2kBpsd9WC6Gp4sb",
  "_version": 1,
  "_score": null,
  "_source": {
    "message": "{\"clientip\":\"-\",\"upstream_addr\":\"-\",\"ident\":\"-\",\"auth\":\"-\",\"_timestamp\":\"1551078694.532\",\"host\":\"172.31.5.173\",\"verb\":\"GET\",\"request\":\"/\",\"httpversion\":\"HTTP/1.1\",\"response\":\"404\",\"bytes\":\"48\",\"referrer\":\"-\",\"agent\":\"ELB-HealthChecker/2.0\",\"req_time\":0.0,\"upstream_resp_time\":0.0,\"proxy_time\":0.0,\"upstream_login\":\"-\",\"req_path\":\"/\",\"upstream_ip\":\"-\",\"upstream_port\":\"-\",\"_hostname\":\"ip-172-31-5-173\",\"_source\":\"nginx.no_upstream.access\",\"_level\":\"info\"}",
    "http": {
      "auth": "-",
      "upstream_addr": "-",
      "referrer": "-",
      "_timestamp": "1551078694.532",
      "host": "172.31.5.173",
      "upstream_port": "-",
      "upstream_resp_time": 0,
      "agent": "ELB-HealthChecker/2.0",
      "clientip": "-",
      "response": "404",
      "ident": "-",
      "_level": "info",
      "upstream_login": "-",
      "request": "/",
      "_hostname": "ip-172-31-5-173",
      "upstream_ip": "-",
      "proxy_time": 0,
      "_source": "nginx.no_upstream.access",
      "verb": "GET",
      "bytes": "48",
      "req_time": 0,
      "req_path": "/",
      "httpversion": "HTTP/1.1"
    },
    "@timestamp": "2019-02-25T07:11:34.532Z",
    "@version": "1"
  },
  "fields": {
    "@timestamp": [
      "2019-02-25T07:11:34.532Z"
    ]
  },
  "sort": [
    1551078694532
  ]
}

mutate

描述

The mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events.

mutate filter 用于针对 event 中的 fields 进行通用变换:

Mutate Filter 支持的全部配置选项

详见 https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-options

mutate 处理顺序

Mutations in a config file are executed in this order:

You can control the order by using separate mutate blocks.

控制调用顺序的好的实践为使用不同的 mutate blocks

例如:先 split 后 rename

filter {
    mutate {
        split => ["hostname", "."]
        add_field => { "shortHostname" => "%{hostname[0]}" }
    }

    mutate {
        rename => ["shortHostname", "hostname" ]
    }
}

add_field

If this filter is successful, add any arbitrary fields to this event. Field names can be dynamic and include parts of the event using the %{field}.

可以通过 add_field 添加任意 field 到 event 中;field 的名字可以是动态的,动态名字通过 %{field} 方式来引用 event 中的其他部分内容;

示例:

filter {
  mutate {
    add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
  }
}
# You can also add multiple fields at once:
filter {
  mutate {
    add_field => {
      "foo_%{somefield}" => "Hello world, from %{host}"
      "new_field" => "new_static_value"
    }
  }
}

If the event has field "somefield" == "hello" this filter, on success, would add field foo_hello if it is present, with the value above and the %{host} piece replaced with that value from the event. The second example would also add a hardcoded field.

上面第一个例子:如果当前 event 中包含了 somefield field ,并且其值为 "hello" ,那么在执行成功后,将会添加一个名为 foo_hello 的 field 到该 event 中,且其值为 "Hello world, from %{host}" ,%{host} 部分会被动态替换;

上面第二个例子:一次性添加多个 field 的用法,且直接使用了 hardcoded field ;

remove_field

If this filter is successful, remove arbitrary fields from this event. Example:

filter {
  mutate {
    remove_field => [ "foo_%{somefield}" ]
  }
}
# You can also remove multiple fields at once:
filter {
  mutate {
    remove_field => [ "foo_%{somefield}", "my_extraneous_field" ]
  }
}

If the event has field "somefield" == "hello" this filter, on success, would remove the field with name foo_hello if it is present. The second example would remove an additional, non-dynamic field.

真实例子

通过

      # extract common field
      mutate {
          add_field => {
              "timestamp" => "%{[http][_timestamp]}"
          }
      }

      # update timestamp
      date {
          #try to update @timestamp which elasticsearch used to sort data with timestamp
          #1468403039, a unix timestamp
          match => ["timestamp", "UNIX"]
      }

      # remove tmp timestamp
      mutate {
          remove_field => [ "timestamp"]
      }

先添加 timestamp field 到 root of the event 下,其值为 "%{[http][_timestamp]}" ,即 http._timestamp 的值,再对其进行相应的处理(这里使用了 date filter),最后再将该 timestamp 移除;因此,在最终的数据中,你不会看到名字为 timestamp 的 field 出现在 root of the event 下;

date

描述

The date filter is used for parsing dates from fields, and then using that date or timestamp as the logstash timestamp for the event.

date filter 用于从所有 fields 中解析 date 相关数据,并将相应的 date 或 timestamp 作为给 event 使用的 logstash timestamp ;

For example, syslog events usually have timestamps like this:

"Apr 17 09:32:01"

You would use the date format MMM dd HH:mm:ss to parse this.

The date filter is especially important for sorting events and for backfilling old data. If you don’t get the date correct in your event, then searching for them later will likely sort out of order.

date filter 对 event 排序和旧数据 backfilling 非常重要;

In the absence of this filter, logstash will choose a timestamp based on the first time it sees the event (at input time), if the timestamp is not already set in the event. For example, with file input, the timestamp is set to the time of each read.

如果没有使用该 filter ,则 logstash 会根据首次见到该 event 的时间作为 timestamp ;

Date Filter 配置选项

match

An array with field name first, and format patterns following, [ field, formats... ]

If your time field has multiple possible formats, you can do this:

针对多种时间格式的解析配置

match => [ "logdate", 
          "MMM dd yyyy HH:mm:ss",
          "MMM  d yyyy HH:mm:ss", 
          "ISO8601" ]

The above will match a syslog (rfc3164) or iso8601 timestamp.

There are a few special exceptions. The following format literals exist to help you save time and ensure correctness of date parsing.

For example, if you have a field logdate, with a value that looks like Aug 13 2010 00:03:44, you would use this configuration:

filter {
  date {
    match => [ "logdate", "MMM dd yyyy HH:mm:ss" ]
  }
}

If your field is nested in your structure, you can use the nested syntax [foo][bar] to match its value. For more information, please refer to Field References

针对有嵌套情况的结构,可以使用嵌套语法来解析;

More details on the syntax

target

Store the matching timestamp into the given target field. If not provided, default to updating the @timestamp field of the event.

用于保存 match 到的 timestamp 到 target field ;如果没有指定 target field ,则默认使用 event 的 @timestamp

真实例子

通过

      # extract common field
      mutate {
          add_field => {
              "timestamp" => "%{[http][_timestamp]}"
          }
      }

      # update timestamp
      date {
          #try to update @timestamp which elasticsearch used to sort data with timestamp
          #1468403039, a unix timestamp
          match => ["timestamp", "UNIX"]
      }

      # remove tmp timestamp
      mutate {
          remove_field => [ "timestamp"]
      }

将 match 到的 timestamp field 的值按照 UNIX 进行解析,并保存到 @timestamp field 中

image