Fluentd 包含 7 种类型的 plugins ：

Input
Parser
Filter
Output
Formatter
Storage
Buffer

已梳理的部分：

[x] Input Plugin
- [x] in_tail
[ ] Parser Plugin
- [x] json
[ ] Filter Plugin
- [x] record_transformer
[ ] Output Plugin
- [x] out_copy
- [x] out_stdout
[ ] 插件安装

一个简单的 fluentd.conf 示例

<source>
  @type tail
  path /path/to/my.log
  pos_file /path/to/my.log.pos
  # Deprecated parameter. Use <parse> instead.
  format none
  tag myapp.logs
</source>

# filter 是可选的，不需要的话，可以不写
<filter myapp.**>
  @type record_transformer

  # 加入 _hostname 与 _source  字段到每一条 log 中
  <record>
    _hostname "#{Socket.gethostname}"
    _source ${tag}
  </record>
</filter>

<match myapp.**>
  # 简单的将收集到的 log 输出到 stdout
  @type stdout
</match>

Input Plugin

Input plugins extend Fluentd to retrieve and pull event logs from external sources. An input plugin typically creates a thread socket and a listen socket. It can also be written to periodically pull data from data sources.

List of Input Plugins

in_tail
in_forward
in_udp
in_tcp
in_http
in_syslog
in_exec
in_dummy
in_windows_eventlog

tail Input Plugin

The in_tail Input plugin allows Fluentd to read events from the tail of text files. Its behavior is similar to the tail -F command.

How it Works

When Fluentd is first configured with in_tail, it will start reading from the tail of that log, not the beginning.
Once the log is rotated, Fluentd starts reading the new file from the beginning. It keeps track of the current inode number.
If td-agent restarts, it starts reading from the last position td-agent read before the restart. This position is recorded in the position file specified by the pos_file parameter.

工作原理：

Fluentd 在 in_tail 配置下，其从 log 的结尾开始读取，而不是开始；
当 log 被 rotate 后，Fluentd 将会从新文件的开始进行读取，Fluentd 会跟踪 inode 数值；
如果 td-agent 重启了，其会从重启前读取到的最后位置开始读取，该位置被记录在 pos_file 中；

示例一

<source>
  @type tail
  path /var/log/httpd-access.log
  pos_file /var/log/td-agent/httpd-access.log.pos
  tag apache.access
  <parse>
    @type apache2
  </parse>
</source>

示例二：* can be used as a placeholder that expands to the actual file path, replacing ‘/’ with ‘.’. For example, if you have the following configuration. in_tail emits the parsed events with the ‘foo.path.to.file’ tag.

path /path/to/file
tag foo.*

示例三：* and strftime format can be included to add/remove watch file dynamically

path /path/to/%Y/%m/%d/*

示例四：multiple paths

path /path/to/a/*,/path/to/b/c.log

示例五：pos_file handles multiple positions in one file so no need multiple pos_file parameters per source.

pos_file /var/log/td-agent/tmp/access.log.pos

示例六：in_tail uses parser plugin to parse the log.

# json
<parse>
  @type json
</parse>

# regexp
<parse>
  @type regexp
  expression ^(?<name>[^ ]*) (?<user>[^ ]*) (?<age>\d*)$
</parse>

示例七：rotate 默认为 5s ，in_tail 的使用需要 lograte 的配合，lograte 不能使用 nocreate

in_tail actually does a bit more than tail -F itself. When rotating a file, some data may still need to be written to the old file as opposed to the new one.

in_tail takes care of this by keeping a reference to the old file (even after it has been rotated) for some time before transitioning completely to the new file. This helps prevent data designated for the old file from getting lost. By default, this time interval is 5 seconds.

The rotate_wait parameter accepts a single integer representing the number of seconds you want this time interval to be.

logrotate has nocreate parameter and it doesn’t create new file after triggered log rotation. It means in_tail can’t find new file to tail.

This parameter doesn’t fit typical application log cases, so check your logrotate setting which doesn’t include nocreate parameter.

在 ubuntu 上通过 apt install 安装 td-agent 后，可以在 /etc/logrotate.d/td-agent 文件中看到

/var/log/td-agent/td-agent.log {
  daily
  rotate 30
  compress
  delaycompress
  notifempty
  create 640 td-agent td-agent
  sharedscripts
  postrotate
    pid=/var/run/td-agent/td-agent.pid
    if [ -s "$pid" ]
    then
      kill -USR1 "$(cat $pid)"
    fi
  endscript
}

Parser Plugin

Sometimes, the <parse> directive for input plugins (ex: in_tail, in_syslog, in_tcp and in_udp) cannot parse the user’s custom data format (for example, a context-dependent grammar that can’t be parsed with a regular expression). To address such cases. Fluentd has a pluggable system that enables the user to create their own parser formats.

使用方式：

Write a custom format plugin.

From any input plugin that supports the <parse> directive, call the custom plugin by its name.

解析 Nginx access logs 的示例

<source>
  @type tail
  path /path/to/input/file
  <parse>
    @type nginx
    keep_time_key true
  </parse>
</source>

Built-in Parsers 列表

regexp
apache2
apache_error
nginx
syslog
csv
tsv
ltsv
json
multiline
none

第三方：

grok

json Parser Plugin

The json parser plugin parses JSON logs. One JSON map per line.

示例

{"time":1362020400,"host":"192.168.0.1","size":777,"method":"PUT"}

被解析为

time:
1362020400 (2013-02-28 12:00:00 +0900)

record:
{
  "host"  : "192.168.0.1",
  "size"  : 777,
  "method": "PUT",
}

Filter Plugin

Filter plugins enables Fluentd to modify event streams. Example use cases are:

Filtering out events by grepping the value of one or more fields.
Enriching events by adding new fields.
Deleting or masking certain fields for privacy and compliance.

filter 插件提供了修改 event streams 的能力：

过滤执行 fields 的 value
添加新的 fields
删除或者隐藏某些 fields

示例一

<filter foo.bar>
  @type grep
  regexp1 message cool
</filter>

The above directive matches events with the tag “foo.bar”, and if the “message” field’s value contains “cool”, the events go through the rest of the configuration.

关键：

上述 filter 会 match 具有 "foo.bar" tag 的 events
一但 match 了，就会通过正则匹配的方式，确定 “message” field 的值是否包含 "cool" ；

示例二

<filter> matches against a tag. Once the event is processed by the filter, the event proceeds through the configuration top-down. Hence, if there are multiple filters for the same tag, they are applied in descending order. Hence, in the following example,

filter 会针对 tag 进行匹配；一旦满足匹配条件，则按照从上至下的顺序，逐个进行处理；

<filter foo.bar>
  @type grep
  regexp1 message cool
</filter>

<filter foo.bar>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
  </record>
</filter>

Only the events whose “message” field contain “cool” get the new field “hostname” with the machine’s hostname as its value.

上例中，只有 events 的 "message" field 中包含 "cool" ，才会被添加新 field "hostname" ；

Filter Plugins 列表

grep
record_transformer
filter_stdout

record_transformer Filter Plugin

The record_transformer filter plugin mutates/transforms incoming event streams in a versatile manner. If there is a need to add/delete/modify events, this plugin is the first filter to try.

示例一：直接添加新 field

<filter foo.bar>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
    tag ${tag}
  </record>
</filter>

The above filter adds the new field “hostname” with the server’s hostname as its value (It is taking advantage of Ruby’s string interpolation) and the new field “tag” with tag value. So, an input like

{"message":"hello world!"}

is transformed into

{"message":"hello world!", "hostname":"db001.internal.example.com", "tag":"foo.bar"}

示例二：基于已有 field 的值得到新的 filed

Here is another example where the field “total” is divided by the field “count” to create a new field “avg”:

<filter foo.bar>
  @type record_transformer
  enable_ruby
  <record>
    avg ${record["total"] / record["count"]}
  </record>
</filter>

It transforms an event like

{"total":100, "count":10}

into

{"total":100, "count":10, "avg":"10"}

示例三：修改已有 field 的值

You can also use this plugin to modify your existing fields as

<filter foo.bar>
  @type record_transformer
  <record>
    message yay, ${record["message"]}
  </record>
</filter>

An input like

{"message":"hello world!"}

is transformed into

{"message":"yay, hello world!"}

示例四：提取 tag 值的不同部分使用

Finally, this configuration embeds the value of the second part of the tag in the field “service_name”. It might come in handy when aggregating data across many services.

<filter web.*>
  @type record_transformer
  <record>
    service_name ${tag_parts[1]}
  </record>
</filter>

So, if an event with the tag “web.auth” and record {"user_id":1, "status":"ok"} comes in, it transforms it into {"user_id":1, "status":"ok", "service_name":"auth"}.

还有不少其他内容，这里略

Output Plugin

Fluentd v1.0 output plugins have 3 modes about buffering and flushing.

Non-Buffered mode doesn’t buffer data and write out results immediately.

Synchronous Buffered mode has “staged” buffer chunks (a chunk is a collection of events) and a queue of chunks, and its behavior can be controlled by <buffer> section (See the diagram below).

Asynchronous Buffered mode also has “stage” and “queue”, but output plugin will not commit writing chunks in methods synchronously, but commit later.

output plugins 支持 3 种模式：

非缓冲模式
同步缓冲模式
异步缓冲模式

Output plugins can support all modes, but may support just one of these modes. Fluentd choose appropriate mode automatically if there are no <buffer> sections in configuration. If users specify <buffer> section for output plugins which doesn’t support buffering, Fluentd will stop with configuration errors.

如果没有设置 <buffer> ，那么 fluentd 会自动选择合适的模式；

Output plugins in v0.14 can control keys of buffer chunking by configurations, dynamically. Users can configure buffer chunk keys as time (any unit specified by user), tag and any key name of records. Output plugin will split events into chunks: events in a chunk have same values for chunk keys. The output plugin’s buffer behavior (if any) is defined by a separate Buffer plugin. Different buffer plugins can be chosen for each output plugin.

Output Plugins 列表

out_copy
out_null
out_roundrobin
out_stdout
out_exec_filter
out_forward
out_mongo or out_mongo_replset
out_exec
out_file
out_s3
out_webhdfs

Difference between v1.0 and v0.12

Fluentd v0.12 uses only <match> section for both of configuration parameters of output plugin and buffer plugin. Fluentd v1.0 uses <buffer> subsection to write parameters for buffering, flushing and retrying. <match> sections are used only for output plugin itself.

Fluentd v0.12 只使用 <match> 配置段，同时用于 output 和 buffer plugin 的配置； Fluentd v1.0 使用 <buffer> 子配置段配置 buffering, flushing 和 retrying ；使用 <match> 段配置 output 插件；

Example of v1.0 output plugin configuration:

<match myservice_name>
  @type file
  path /my/data/access.${tag}.%Y-%m-%d.%H%M.log
  <buffer tag,time>
    @type file
    path /my/buffer/myservice
    timekey     60m
    timekey_wait 1m
  </buffer>
</source>

For Fluentd v0.12, configuration parameters for buffer plugins were written in same section:

<match myservice_name>
  @type file
  path /my/data/access.myservice_name.*.log
  buffer_type file
  buffer_path /my/buffer/myservice/access.myservice_name.*.log
  time_slice_format %Y-%m-%d.%H%M
  time_slice_wait   1m
</source>

Buffering/Retrying Parameters

Control Flushing

flush_mode
flush_interval
flush_thread_count
flush_thread_interval
flush_thread_burst_interval
delayed_commit_timeout
overflow_action
- throw_exception
- block
- drop_oldest_chunk

Control Retrying

If the bottom chunk write out fails, it will remain in the queue and Fluentd will retry after waiting several seconds (retry_wait). If the retry limit has not been disabled (retry_forever is true) and the retry count exceeds the specified limit (retry_max_times), the chunk is trashed. The retry wait time doubles each time (1.0sec, 2.0sec, 4.0sec, …) until retry_max_interval is reached. If the queue length exceeds the specified limit (queue_limit_length), new events are rejected.

retry_type
retry_forever
retry_timeout
retry_max_times
retry_secondary_threshold
retry_wait
retry_exponential_backoff_base
retry_max_interval
retry_randomize

Secondary Output

In buffered mode, the user can specify <secondary> with any output plugin in <match> configuration. If plugins continue to fail writing buffer chunks and exceeds the timeout threshold for retries, then output plugins will delegate to write the buffer chunk to secondary plugin.

<secondary> is useful for backup when destination servers are unavailable, e.g. forward, mongo and other plugins. We strongly recommend out_secondary_file plugin for <secondary>.

copy Output Plugin

The copy output plugin copies events to multiple outputs.

copy output plugin 负责将 events copy 到多个 output 上；

示例一：通过 copy 将 events 转发到多个 output 上

<match pattern>
  @type copy
  <store>
    @type file
    path /var/log/fluent/myapp1
    ...
  </store>
  <store>
    ...
  </store>
  <store>
    ...
  </store>
</match>

示例二：将 events 发送到本地文件 /var/log/fluent/myapp 上和 es 实例中

Here is an example set up to send events to both a local file under /var/log/fluent/myapp and the collection fluentd.test in a Elasticsearch instance (Please see the out_file and out_elasticsearch articles for more details about the respective plugins.)

<match myevent.file_and_elasticsearch>
  @type copy
  <store>
    @type file
    path /var/log/fluent/myapp
    compress gzip
    <format>
      localtime false
    </format>
    <buffer time>
      timekey_wait 10m
      timekey 86400
      timekey_use_utc true
      path /var/log/fluent/myapp
    </buffer>
    <inject>
      time_format %Y%m%dT%H%M%S%z
      localtime false
    </inject>
  </store>
  <store>
    @type elasticsearch
    host fluentd
    port 9200
    index_name fluentd
    type_name fluentd
  </store>
</match>

stdout Output Plugin

The stdout output plugin prints events to stdout (or logs if launched with daemon mode). This output plugin is useful for debugging purposes.

该插件将 events 输出到 stdout 或者 logs 中（如果通过 daemon 模式启动）；主要用于调试目的；

示例一

<match pattern>
  @type stdout
</match>

Supported modes

Non-Buffered
Synchronous

moooofly / MarkSomethingDownLLS

td-agent 配置梳理 #75

Life of an Fluentd event

相关