公司文档和项目：

文章：

[x] Overview of Server-Side Agent (td-agent) -- td-agent 和 fluentd 关系
[x] High Availability td-agent Configuration -- 高可用
[x] Monitoring td-agent -- 监控
[x] td-agent v2 vs. td-agent v3 -- 版本比较
[x] Installing Fluentd Using deb Package -- 安装
[ ] fluentd FAQ

插件（下面的内容主要是 http output 相关的）：

fluentd 官方 plugins 列表
fluent-plugin-out-http - A generic fluentd output plugin for sending logs to an HTTP endpoint. If you'd like to retry failed requests, consider using fluent-plugin-bufferize. (49 / 2019) -- 可用，要求每行日志的格式为均为 json array
fluent-plugin-out-https - A generic fluentd output plugin for sending logs to an HTTP and HTTPS endpoint. (3 / 2014) -- 在 fluent-plugin-out-http 基础上增加了点东西而已
fluentd_https_out - A fluentd buffered output filter that posts to https a json array of records. (6 / 2014)
fluent-plugin-http_forward - Fluentd http output plugin support batching and buffering which also optionally supports authentication. (2 / 2016)
fluent-plugin-https-client - Output plugin for Fluentd, for sending records to an HTTP or HTTPS endpoint, with SSL, Proxy, and Header implementation. (0 / 2017)
fluent-plugin-out-http-buffered - This is an output plugin for Fluentd which deliveres buffered log messages to an http endpoint. (10 / 2013) -- 安装时有错误
fluent-plugin-http_file_upload - send fluentd messages to web servers as file uploading. (0 / 2016) -- 没用

Overview of Server-Side Agent (td-agent)

Arm Treasure Data provides Server-Side Agent called Treasure Agent (td-agent), to collect server-side logs and events. You can continuously import data using td-agent.

td-agent 是 Arm Treasure Data 公司提供的、用于收集服务器侧日志和事件的 agent ；

Logs Are Streams, Not Files

Logs are usually rotated on an hourly or daily basis based on time or size. This system quickly produces many large log files that need to be batch imported for further analysis. This is an outdated approach. Logs are better treated as continuously generated STREAMS as opposed to files.

旧式日志系统会将 logs 按照小时或每天的维度、基于 time 或 size 进行 rotate ；而更现代的观念是：应该将 logs 看做持续生成的流而不是文件；

"Server daemons (such as PostgreSQL or Nginx) and applications (such as a Rails or Django app) sometimes offer a configuration parameter for a path to the program’s logfile. This can lead us to think of logs as files. But a better conceptual model is to treat logs as time-ordered streams..." - Logs Are Streams, Not Files Adam Wiggins, Heroku co-founder.

td-agent, a data collection daemon, is used to import data continuously to Treasure Data. Although bulk-import is supported, we recommend importing your data continuously via td-agent.

td-agent 既支持 bulk-import 也支持流式导入；

What is Treasure Agent?

td-agent is a data collection daemon. It collects logs from various data sources and uploads them to Treasure Data.

Treasure Agent and Fluentd

Use Treasure Agent 3, fluentd v0.14 (v1.0) series. We are deprecating Treasure Agent 2, fluentd v0.12 series.

推荐 Treasure Agent 3 + fluentd v0.14 (v1.0) 组合

How to install Treasure Agent?

Ubuntu & Debian

# 16.04 Xenial (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
# 14.04 Trusty
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent3.sh | sh
# 12.04 Precise
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent3.sh | sh

# Debian Stretch (64-bit only) 
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-stretch-td-agent3.sh | sh
# Debian Jessie (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-jessie-td-agent3.sh | sh
# Debian Squeeze (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-squeeze-td-agent2.sh | sh

Set up td-agent

After installing td-agent, you can modify your config file. The file can be found in /etc/td-agent/td-agent.conf.

（配置文件略）

Restart the td-agent service.

# Linux
$ sudo /etc/init.d/td-agent restart

# MacOS X
$ sudo launchctl unload /Library/LaunchDaemons/td-agent.plist
$ sudo launchctl load /Library/LaunchDaemons/td-agent.plist

Agent Overhead

The td-agent resource consumption is roughly:

Resident memory (actual RAM used): 50MB
CPU Runtime: less than 2% of averaged runtime + your workload
Disk:
- Linux 120MB + your file buffer (configurable)

If you think td-agent is slow, see “5 Tips to Optimize Fluentd Performance”.

Files Installed by the Packages

Resource	Location	Notes
Config Directory	/etc/td-agent/
Config File	/etc/td-agent/td-agent.conf	This config will be picked-up by the startup script
Startup Script	/etc/init.d/td-agent
Log Directory	/var/log/td-agent/
Plugin Directory	/etc/td-agent/plugin/	Your custom plugins go here.
Ruby Interpreter	/opt/td-agent/embedded/bin/ruby	Ruby v2.1 is bundled with the package.
Rubygems	/usr/sbin/td-agent-gem	Bundled rubygems to install fluentd plugins. For example: `/usr/sbin/td-agent-gem install fluent-plugin-mongo`
jemalloc	/opt/td-agent/embedded/lib/libjemalloc.so	jemalloc is bundled together to avoid memory fragmentation. It is loaded by default in the startup script.

Supervision, Privileges and Network Ports

When td-agent starts, it launches 2 processes: master and slave. The master process is managing the life cycle of slave process, and slave process handles actual log collection.

Both processes run as the td-agent user under td-agent group, and all forked subprocesses run as the same. This applies to any system call initiated by td-agent as well. The agent configuration resides at /etc/td-agent/td-agent.conf. All configurations must be readable by td-agent.

The following ports are open depending on your input.

in_tail: nothing
in_forward: tcp/24224, udp/24224
in_unix: /var/run/td-agent/td-agent.sock

Debugging

If you are having issues, add the following line to /etc/default/td-agent to enable verbose logging:

DAEMON_ARGS=-vv

After that, restart the daemon. You can now find more verbose logs in /var/log/td-agent.log

High-Availability Configurations and Monitoring

For high-traffic websites, we recommend using a high availability configuration for td-agent. Monitoring the daemon is also important.

td-agent is fully open-sourced under the fluentd project.

High Availability td-agent Configuration

Message Delivery Semantics

td-agent is designed primarily for event-log delivery systems.

In such systems, several delivery guarantees are possible:

At most once: Messages are immediately transferred. If the transfer succeeds, the message is never sent out again. However, many failure scenarios can cause lost messages (ex: no more write capacity)

At least once: Each message is delivered at least once. In failure cases, messages may be delivered twice.

Exactly once: Each message is delivered once and only once. This is what people want.

关于几种 delivery guarantees 的说明：

At most once：至多一次，很多场景可能导致消息丢失；
At least once：至少一次，消息接收方需要能够支持消息去重，幂等性处理；
Exactly once：只发一次；

If the system “can’t lose a single event”, and must also transfer “exactly once”, then the system must stop processing events when it runs out of write capacity. The proper approach would be to use synchronous logging and return errors when the event cannot be accepted.

如果你的系统无法接受“丢失任何一条消息”，并且只允许发送消息“一次”，那么你应该使用同步日志系统；

That’s why td-agent guarantees ‘At most once’ transfer. In order to collect massive amounts of data without impacting application performance, a data logger must transfer data asynchronously. Performance improves at the cost of potential delivery failure.

td-agent 使用的是 At most once 模式，主要考虑采集大量日志数据，又不能影响应用性能，故只能使用异步方式进行数据传输；性能的提升的代价就是存在消息投递失败可能；

However, most failure scenarios are preventable. The following sections describe how to set up td-agent’s topology for high availability.

需要知道的是，大部分失败场景都是可以避免的；下面的内容就是针对如何构建 td-agent 的 HA ；

Network Topology

To configure td-agent for high availability, we assume that your network consists of ‘log forwarders’ and ‘log aggregators’.

td-agent 的高可用拓扑中包含了 log forwarders 和 log aggregators 组件；

‘log forwarders’ are typically installed on every node to receive local events. Once an event is received, they forward it to the ‘log aggregators’ through the network.

log forwarders 在每一个 node 上都安装，用于接收本地 event ，再转发给 log aggregators ；

‘log aggregators’ are daemons that continuously receive events from the log forwarders. They buffer the events and periodically upload the data into the cloud.

log aggregators 作为 daemon 持续接收来自 log forwarders 的 events ；log aggregators 会对 events 进行缓存，并周期性的上传数据到云端；

td-agent can act as either a log forwarder or a log aggreagator, depending on its configuration. The next sections describes the setups. We assume that the active log aggregator has ip ‘192.168.0.1’ and that the backup has ip ‘192.168.0.2’.

td-agent 基于配置文件化身为 log forwarder 或 log aggreagator 来运行；

Log Forwarder Configuration

略

Log Aggregator Configuration

略

Failure Case Scenarios

Forwarder Failure

When a log forwarder receives events from applications, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is forwarded to aggregators.

log forwarder 从应用处收到 events 后，会先将其写入 disk buffer ，之后再以 flush_interval 为间隔，周期性的转发给 log aggregators ；

This process is inherently robust against data loss. If a log forwarder’s td-agent process dies, the buffered data is properly transferred to its aggregator after it restarts. If the network between forwarders and aggregators breaks, the data transfer is automatically retried. That being said, possible message loss scenarios do exist:

The process dies immediately after receiving the events, but before writing them into the buffer.

The forwarder’s disk is broken, and the file buffer is lost.

存在两种可能导致消息丢失的情况；

Aggregator Failure

When log aggregators receive events from log forwarders, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is uploaded into the cloud.

This process is inherenty robust against data loss. If a log aggregator’s td-agent process dies, the data from the log forwarder is properly retransferred after it restarts. If the network between aggregators and the cloud breaks, the data transfer is automatically retried.

That being said, possible message loss scenarios do exist:

The process dies immediately after receiving the events, but before writing them into the buffer.
The aggregator’s disk is broken, and the file buffer is lost.

存在两种可能导致消息丢失的情况；

Monitoring td-agent

Fluentd Metrics Monitoring

Monitoring via HTTP

td-agent has a built-in monitoring agent to retrieve internal metrics in JSON via HTTP. Please add the following lines to your configuration file.

td-agent 通过一个内置的监控 agent 提供了基于 HTTP 的、JSON 个格式的 metrics 的获取；

<source>
  type monitor_agent
  bind 0.0.0.0
  port 24220
</source>

disable_node_info (default true): Send system metrics, CPU / Memory / Disk, or not.

Next, please restart the agent and get the metrics via HTTP.

通过 HTTP 方式就可以获取相应的 metrics 了

$ curl http://host:24220/api/plugins.json
{"plugins":[{"plugin_id":"object:3fec669d6ac4","type":"forward","output_plugin":false,"config":{"type":"forward"}},{"plugin_id":"object:3fec669daf98","type":"http","output_plugin":false,"config":{"type":"http","port":"8888"}},{"plugin_id":"object:3fec669dfa48","type":"monitor_agent","output_plugin":false,"config":{"type":"monitor_agent","port":"24220"}},{"plugin_id":"object:3fec66a52e94","type":"debug_agent","output_plugin":false,"config":{"type":"debug_agent","port":"24230"}},{"plugin_id":"object:3fec66ae3dcc","type":"stdout","output_plugin":true,"config":{"type":"stdout"}},{"plugin_id":"object:3fec66aead48","type":"forward","output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":0,"retry_count":0,"config":{"type":"forward","host":"192.168.0.11"}}]}%

Monitoring with Prometheus or Datadog

Additionally, td-agent works with monitoring tools such as Prometheus, Datadog, etc. Our recommendation is to use Prometheus since we will be collaborating more in the future under the CNCF (Cloud Native Computing Foundation).

td-agent 还支持通过监控工具获取 metrics 信息，推荐使用 Prometheus ；

Process Monitoring

Two ruby processes (parent and child) are executed. Please make sure that these processes are running.

For td-agent on Linux, you can check the process statuses with the following command. Two processes should be shown if there are no issues.

$ ps w -C ruby -C td-agent --no-heading
32342 ?        Sl     0:00 /usr/lib/fluent/ruby/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log
32345 ?        Sl     0:01 /usr/lib/fluent/ruby/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log

Port Monitoring

td-agent opens the following ports by default. We recommend checking the availability of these ports.

TCP 0.0.0.0 8888 (HTTP)
TCP 0.0.0.0 24224 (Forward)

If you don’t send any data, the daemon doesn’t do anything.

Debug Port

A debug port for local communication is also opened.

TCP 127.0.0.1 24230

td-agent v2 vs. td-agent v3

Treasure Data, Inc. maintains stable packages for Fluentd and canonical plugins as Treasure Agent (the package is called td-agent). td-agent has v2 and v3. td-agent v2 for the production and v3 is the new stable version for working with ruby 2.4 and fluetnd v1 series.

Treasure Data 公司基于开源的 Fluentd 维护了 stable packages ，并开发了一个 plugin 叫做 Treasure Agent (td-agent)

Installing Fluentd Using deb Package

What is td-agent?

Fluentd is written in Ruby for flexibility, with performance sensitive parts written in C. However, some users may have difficulty installing and operating a Ruby daemon.

Fluentd 基于 Ruby 和 C 编写；一些用户在安装和运行 Ruby daemon 类程序时有困难；

That’s why Treasure Data, Inc is providing the stable distribution of Fluentd, called td-agent. The differences between Fluentd and td-agent can be found here.

Step 0: Before Installation

Please follow the Preinstallation Guide to configure your OS properly. This will prevent many unnecessary problems.

Step 1: Install from Apt Repository

For Ubuntu, we currently support “Ubuntu 18.04 LTS / Bionic 64bit”, “Ubuntu 16.04 LTS / Xenial 64bit”, “Ubuntu 14.04 LTS / Trusty 64bit”. For Debian, we currently support “Debian 9 Stretch 64bit”, “Debian 8 Jessie 64bit”.

A shell script is provided to automate the installation process for each version. The shell script registers a new apt repository at /etc/apt/sources.list.d/treasure-data.list and installs the td-agent deb package.

For Ubuntu Bionic,

针对 Ubuntu 18.04 的安装命令

curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-bionic-td-agent3.sh | sh

Step 2: Launch Daemon

systemd

The /lib/systemd/system/td-agent.service script is provided to start, stop, or restart the agent.

root@proxy-beijing:~# cat /lib/systemd/system/td-agent.service
[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target

[Service]
User=td-agent  -- 这里后面被我调整成了 root ，否则在读写取一些目录下的文件时，会遇到权限问题
Group=td-agent  -- 同上，改为 root
LimitNOFILE=65536  -- 这里已经直接调整了配置，所以基于 systemd 启动服务的话，无需调整 ulimit
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_OPTIONS=
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120

[Install]
WantedBy=multi-user.target
root@proxy-beijing:~#

运行状态查看

root@proxy-beijing:~# systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/lib/systemd/system/td-agent.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-03-07 16:59:08 CST; 17min ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 19432 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 19460 (fluentd)
    Tasks: 11 (limit: 4915)
   CGroup: /system.slice/td-agent.service
           ├─19460 /opt/td-agent/embedded/bin/ruby /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
           └─19467 /opt/td-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --under-superviso

Mar 07 16:59:07 proxy-beijing systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
Mar 07 16:59:08 proxy-beijing systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.
root@proxy-beijing:~#

If you want to customize systemd behaviour, put your td-agent.service into /etc/systemd/system

想要自己定制化 systemd 行为时，才需要将调整后的 td-agent.service 放入 /etc/systemd/system 目录下；

Step 3: Post Sample Logs via HTTP

By default, /etc/td-agent/td-agent.conf is configured to take logs from HTTP and route them to stdout (/var/log/td-agent/td-agent.log). You can post sample log records using the curl command.

$ curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test

效果

root@proxy-beijing:/opt/apps# tail -f /var/log/td-agent/td-agent.log
2019-03-07 16:59:08 +0800 [info]: adding match pattern="td.*.*" type="tdlog"
2019-03-07 16:59:08 +0800 [warn]: #0 [output_td] secondary type should be same with primary one primary="Fluent::Plugin::TreasureDataLogOutput" secondary="Fluent::Plugin::FileOutput"
2019-03-07 16:59:08 +0800 [info]: adding match pattern="debug.**" type="stdout"
2019-03-07 16:59:08 +0800 [info]: adding source type="forward"
2019-03-07 16:59:08 +0800 [info]: adding source type="http"
2019-03-07 16:59:08 +0800 [info]: adding source type="debug_agent"
2019-03-07 16:59:08 +0800 [info]: #0 starting fluentd worker pid=19467 ppid=19460 worker=0
2019-03-07 16:59:08 +0800 [info]: #0 [input_debug_agent] listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine" worker=0
2019-03-07 16:59:08 +0800 [info]: #0 [input_forward] listening port port=24224 bind="0.0.0.0"
2019-03-07 16:59:08 +0800 [info]: #0 fluentd worker is now running worker=0

2019-03-07 17:12:16.088190324 +0800 debug.test: {"json":"message"}

moooofly / MarkSomethingDownLLS

fluentd 和 td-agent 信息汇总 #65