moooofly / MarkSomethingDownLLS

本仓库用于记录自 2017年10月16日起,在英语流利说任职期间的各方面知识汇总(以下内容不足以体现全部,一些敏感内容已移除)~
MIT License
72 stars 37 forks source link

td-agent 配置梳理 #75

Open moooofly opened 5 years ago

moooofly commented 5 years ago

Fluentd 包含 7 种类型的 plugins :


已梳理的部分:


一个简单的 fluentd.conf 示例

<source>
  @type tail
  path /path/to/my.log
  pos_file /path/to/my.log.pos
  # Deprecated parameter. Use <parse> instead.
  format none
  tag myapp.logs
</source>

# filter 是可选的,不需要的话,可以不写
<filter myapp.**>
  @type record_transformer

  # 加入 _hostname 与 _source  字段到每一条 log 中
  <record>
    _hostname "#{Socket.gethostname}"
    _source ${tag}
  </record>
</filter>

<match myapp.**>
  # 简单的将收集到的 log 输出到 stdout
  @type stdout
</match>

Life of an Fluentd event

image

image

image

image

image

相关

moooofly commented 5 years ago

Input Plugin

Input plugins extend Fluentd to retrieve and pull event logs from external sources. An input plugin typically creates a thread socket and a listen socket. It can also be written to periodically pull data from data sources.

List of Input Plugins

tail Input Plugin

The in_tail Input plugin allows Fluentd to read events from the tail of text files. Its behavior is similar to the tail -F command.

How it Works

工作原理:

示例一

<source>
  @type tail
  path /var/log/httpd-access.log
  pos_file /var/log/td-agent/httpd-access.log.pos
  tag apache.access
  <parse>
    @type apache2
  </parse>
</source>

示例二:* can be used as a placeholder that expands to the actual file path, replacing ‘/’ with ‘.’. For example, if you have the following configuration. in_tail emits the parsed events with the ‘foo.path.to.file’ tag.

path /path/to/file
tag foo.*

示例三:* and strftime format can be included to add/remove watch file dynamically

path /path/to/%Y/%m/%d/*

示例四:multiple paths

path /path/to/a/*,/path/to/b/c.log

示例五:pos_file handles multiple positions in one file so no need multiple pos_file parameters per source.

pos_file /var/log/td-agent/tmp/access.log.pos

示例六:in_tail uses parser plugin to parse the log.

# json
<parse>
  @type json
</parse>

# regexp
<parse>
  @type regexp
  expression ^(?<name>[^ ]*) (?<user>[^ ]*) (?<age>\d*)$
</parse>

示例七:rotate 默认为 5s ,in_tail 的使用需要 lograte 的配合,lograte 不能使用 nocreate

in_tail actually does a bit more than tail -F itself. When rotating a file, some data may still need to be written to the old file as opposed to the new one.

in_tail takes care of this by keeping a reference to the old file (even after it has been rotated) for some time before transitioning completely to the new file. This helps prevent data designated for the old file from getting lost. By default, this time interval is 5 seconds.

The rotate_wait parameter accepts a single integer representing the number of seconds you want this time interval to be.

logrotate has nocreate parameter and it doesn’t create new file after triggered log rotation. It means in_tail can’t find new file to tail.

This parameter doesn’t fit typical application log cases, so check your logrotate setting which doesn’t include nocreate parameter.

在 ubuntu 上通过 apt install 安装 td-agent 后,可以在 /etc/logrotate.d/td-agent 文件中看到

/var/log/td-agent/td-agent.log {
  daily
  rotate 30
  compress
  delaycompress
  notifempty
  create 640 td-agent td-agent
  sharedscripts
  postrotate
    pid=/var/run/td-agent/td-agent.pid
    if [ -s "$pid" ]
    then
      kill -USR1 "$(cat $pid)"
    fi
  endscript
}
moooofly commented 5 years ago

Parser Plugin

Sometimes, the <parse> directive for input plugins (ex: in_tail, in_syslog, in_tcp and in_udp) cannot parse the user’s custom data format (for example, a context-dependent grammar that can’t be parsed with a regular expression). To address such cases. Fluentd has a pluggable system that enables the user to create their own parser formats.

使用方式:

  • Write a custom format plugin.
  • From any input plugin that supports the <parse> directive, call the custom plugin by its name.

解析 Nginx access logs 的示例

<source>
  @type tail
  path /path/to/input/file
  <parse>
    @type nginx
    keep_time_key true
  </parse>
</source>

Built-in Parsers 列表

第三方:

json Parser Plugin

The json parser plugin parses JSON logs. One JSON map per line.

示例

{"time":1362020400,"host":"192.168.0.1","size":777,"method":"PUT"}

被解析为

time:
1362020400 (2013-02-28 12:00:00 +0900)

record:
{
  "host"  : "192.168.0.1",
  "size"  : 777,
  "method": "PUT",
}
moooofly commented 5 years ago

Filter Plugin

Filter plugins enables Fluentd to modify event streams. Example use cases are:

filter 插件提供了修改 event streams 的能力:

示例一

<filter foo.bar>
  @type grep
  regexp1 message cool
</filter>

The above directive matches events with the tag “foo.bar”, and if the “message” field’s value contains “cool”, the events go through the rest of the configuration.

关键:

示例二

<filter> matches against a tag. Once the event is processed by the filter, the event proceeds through the configuration top-down. Hence, if there are multiple filters for the same tag, they are applied in descending order. Hence, in the following example,

filter 会针对 tag 进行匹配;一旦满足匹配条件,则按照从上至下的顺序,逐个进行处理;

<filter foo.bar>
  @type grep
  regexp1 message cool
</filter>

<filter foo.bar>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
  </record>
</filter>

Only the events whose “message” field contain “cool” get the new field “hostname” with the machine’s hostname as its value.

上例中,只有 events 的 "message" field 中包含 "cool" ,才会被添加新 field "hostname" ;

Filter Plugins 列表

record_transformer Filter Plugin

The record_transformer filter plugin mutates/transforms incoming event streams in a versatile manner. If there is a need to add/delete/modify events, this plugin is the first filter to try.

示例一:直接添加新 field

<filter foo.bar>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
    tag ${tag}
  </record>
</filter>

The above filter adds the new field “hostname” with the server’s hostname as its value (It is taking advantage of Ruby’s string interpolation) and the new field “tag” with tag value. So, an input like

{"message":"hello world!"}

is transformed into

{"message":"hello world!", "hostname":"db001.internal.example.com", "tag":"foo.bar"}

示例二:基于已有 field 的值得到新的 filed

Here is another example where the field “total” is divided by the field “count” to create a new field “avg”:

<filter foo.bar>
  @type record_transformer
  enable_ruby
  <record>
    avg ${record["total"] / record["count"]}
  </record>
</filter>

It transforms an event like

{"total":100, "count":10}

into

{"total":100, "count":10, "avg":"10"}

示例三:修改已有 field 的值

You can also use this plugin to modify your existing fields as

<filter foo.bar>
  @type record_transformer
  <record>
    message yay, ${record["message"]}
  </record>
</filter>

An input like

{"message":"hello world!"}

is transformed into

{"message":"yay, hello world!"}

示例四:提取 tag 值的不同部分使用

Finally, this configuration embeds the value of the second part of the tag in the field “service_name”. It might come in handy when aggregating data across many services.

<filter web.*>
  @type record_transformer
  <record>
    service_name ${tag_parts[1]}
  </record>
</filter>

So, if an event with the tag “web.auth” and record {"user_id":1, "status":"ok"} comes in, it transforms it into {"user_id":1, "status":"ok", "service_name":"auth"}.

还有不少其他内容,这里略

moooofly commented 5 years ago

Output Plugin

Fluentd v1.0 output plugins have 3 modes about buffering and flushing.

  • Non-Buffered mode doesn’t buffer data and write out results immediately.
  • Synchronous Buffered mode has “staged” buffer chunks (a chunk is a collection of events) and a queue of chunks, and its behavior can be controlled by <buffer> section (See the diagram below).
  • Asynchronous Buffered mode also has “stage” and “queue”, but output plugin will not commit writing chunks in methods synchronously, but commit later.

output plugins 支持 3 种模式:

image

Output plugins can support all modes, but may support just one of these modes. Fluentd choose appropriate mode automatically if there are no <buffer> sections in configuration. If users specify <buffer> section for output plugins which doesn’t support buffering, Fluentd will stop with configuration errors.

如果没有设置 <buffer> ,那么 fluentd 会自动选择合适的模式;

Output plugins in v0.14 can control keys of buffer chunking by configurations, dynamically. Users can configure buffer chunk keys as time (any unit specified by user), tag and any key name of records. Output plugin will split events into chunks: events in a chunk have same values for chunk keys. The output plugin’s buffer behavior (if any) is defined by a separate Buffer plugin. Different buffer plugins can be chosen for each output plugin.

Output Plugins 列表

Difference between v1.0 and v0.12

Fluentd v0.12 uses only <match> section for both of configuration parameters of output plugin and buffer plugin. Fluentd v1.0 uses <buffer> subsection to write parameters for buffering, flushing and retrying. <match> sections are used only for output plugin itself.

Fluentd v0.12 只使用 <match> 配置段,同时用于 outputbuffer plugin 的配置; Fluentd v1.0 使用 <buffer> 子配置段配置 buffering, flushingretrying ;使用 <match> 段配置 output 插件;

Example of v1.0 output plugin configuration:

<match myservice_name>
  @type file
  path /my/data/access.${tag}.%Y-%m-%d.%H%M.log
  <buffer tag,time>
    @type file
    path /my/buffer/myservice
    timekey     60m
    timekey_wait 1m
  </buffer>
</source>

For Fluentd v0.12, configuration parameters for buffer plugins were written in same section:

<match myservice_name>
  @type file
  path /my/data/access.myservice_name.*.log
  buffer_type file
  buffer_path /my/buffer/myservice/access.myservice_name.*.log
  time_slice_format %Y-%m-%d.%H%M
  time_slice_wait   1m
</source>

Buffering/Retrying Parameters

Control Flushing

Control Retrying

If the bottom chunk write out fails, it will remain in the queue and Fluentd will retry after waiting several seconds (retry_wait). If the retry limit has not been disabled (retry_forever is true) and the retry count exceeds the specified limit (retry_max_times), the chunk is trashed. The retry wait time doubles each time (1.0sec, 2.0sec, 4.0sec, …) until retry_max_interval is reached. If the queue length exceeds the specified limit (queue_limit_length), new events are rejected.

Secondary Output

In buffered mode, the user can specify <secondary> with any output plugin in <match> configuration. If plugins continue to fail writing buffer chunks and exceeds the timeout threshold for retries, then output plugins will delegate to write the buffer chunk to secondary plugin.

<secondary> is useful for backup when destination servers are unavailable, e.g. forward, mongo and other plugins. We strongly recommend out_secondary_file plugin for <secondary>.

copy Output Plugin

The copy output plugin copies events to multiple outputs.

copy output plugin 负责将 events copy 到多个 output 上;

示例一:通过 copy 将 events 转发到多个 output 上

<match pattern>
  @type copy
  <store>
    @type file
    path /var/log/fluent/myapp1
    ...
  </store>
  <store>
    ...
  </store>
  <store>
    ...
  </store>
</match>

示例二:将 events 发送到本地文件 /var/log/fluent/myapp 上和 es 实例中

Here is an example set up to send events to both a local file under /var/log/fluent/myapp and the collection fluentd.test in a Elasticsearch instance (Please see the out_file and out_elasticsearch articles for more details about the respective plugins.)

<match myevent.file_and_elasticsearch>
  @type copy
  <store>
    @type file
    path /var/log/fluent/myapp
    compress gzip
    <format>
      localtime false
    </format>
    <buffer time>
      timekey_wait 10m
      timekey 86400
      timekey_use_utc true
      path /var/log/fluent/myapp
    </buffer>
    <inject>
      time_format %Y%m%dT%H%M%S%z
      localtime false
    </inject>
  </store>
  <store>
    @type elasticsearch
    host fluentd
    port 9200
    index_name fluentd
    type_name fluentd
  </store>
</match>

stdout Output Plugin

The stdout output plugin prints events to stdout (or logs if launched with daemon mode). This output plugin is useful for debugging purposes.

该插件将 events 输出到 stdout 或者 logs 中(如果通过 daemon 模式启动);主要用于调试目的;

示例一

<match pattern>
  @type stdout
</match>

Supported modes

moooofly commented 5 years ago

插件安装