Open moooofly opened 5 years ago
Arm Treasure Data provides Server-Side Agent called Treasure Agent (
td-agent
), to collect server-side logs and events. You can continuously import data usingtd-agent
.
td-agent
是 Arm Treasure Data 公司提供的、用于收集服务器侧日志和事件的 agent ;
Logs are usually rotated on an hourly or daily basis based on time or size. This system quickly produces many large log files that need to be batch imported for further analysis. This is an outdated approach. Logs are better treated as continuously generated STREAMS as opposed to files.
旧式日志系统会将 logs 按照 小时 或 每天 的维度、基于 time 或 size 进行 rotate ;而更现代的观念是:应该将 logs 看做持续生成的 流 而不是 文件;
"Server daemons (such as PostgreSQL or Nginx) and applications (such as a Rails or Django app) sometimes offer a configuration parameter for a path to the program’s logfile. This can lead us to think of logs as files. But a better conceptual model is to treat logs as time-ordered streams..." - Logs Are Streams, Not Files Adam Wiggins, Heroku co-founder.
td-agent
, a data collection daemon, is used to import data continuously to Treasure Data. Although bulk-import is supported, we recommend importing your data continuously viatd-agent
.
td-agent
既支持 bulk-import 也支持流式导入;
td-agent
is a data collection daemon. It collects logs from various data sources and uploads them to Treasure Data.
Use Treasure Agent 3, fluentd v0.14 (v1.0) series. We are deprecating Treasure Agent 2, fluentd v0.12 series.
推荐 Treasure Agent 3 + fluentd v0.14 (v1.0) 组合
# 16.04 Xenial (64bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
# 14.04 Trusty
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent3.sh | sh
# 12.04 Precise
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent3.sh | sh
# Debian Stretch (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-stretch-td-agent3.sh | sh
# Debian Jessie (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-jessie-td-agent3.sh | sh
# Debian Squeeze (64-bit only)
$ curl -L https://toolbelt.treasuredata.com/sh/install-debian-squeeze-td-agent2.sh | sh
After installing
td-agent
, you can modify your config file. The file can be found in/etc/td-agent/td-agent.conf
.
(配置文件略)
Restart the td-agent
service.
# Linux
$ sudo /etc/init.d/td-agent restart
# MacOS X
$ sudo launchctl unload /Library/LaunchDaemons/td-agent.plist
$ sudo launchctl load /Library/LaunchDaemons/td-agent.plist
The td-agent resource consumption is roughly:
If you think td-agent
is slow, see “5 Tips to Optimize Fluentd Performance”.
Resource | Location | Notes |
---|---|---|
Config Directory | /etc/td-agent/ | |
Config File | /etc/td-agent/td-agent.conf | This config will be picked-up by the startup script |
Startup Script | /etc/init.d/td-agent | |
Log Directory | /var/log/td-agent/ | |
Plugin Directory | /etc/td-agent/plugin/ | Your custom plugins go here. |
Ruby Interpreter | /opt/td-agent/embedded/bin/ruby | Ruby v2.1 is bundled with the package. |
Rubygems | /usr/sbin/td-agent-gem | Bundled rubygems to install fluentd plugins. For example: /usr/sbin/td-agent-gem install fluent-plugin-mongo |
jemalloc | /opt/td-agent/embedded/lib/libjemalloc.so | jemalloc is bundled together to avoid memory fragmentation. It is loaded by default in the startup script. |
When td-agent
starts, it launches 2 processes: master and slave. The master process is managing the life cycle of slave process, and slave process handles actual log collection.
Both processes run as the td-agent
user under td-agent
group, and all forked subprocesses run as the same. This applies to any system call initiated by td-agent
as well. The agent configuration resides at /etc/td-agent/td-agent.conf
. All configurations must be readable by td-agent
.
The following ports are open depending on your input.
If you are having issues, add the following line to /etc/default/td-agent
to enable verbose logging:
DAEMON_ARGS=-vv
After that, restart the daemon. You can now find more verbose logs in /var/log/td-agent.log
For high-traffic websites, we recommend using a high availability configuration for td-agent
. Monitoring the daemon is also important.
td-agent
is fully open-sourced under the fluentd
project.
td-agent
is designed primarily for event-log delivery systems.
In such systems, several delivery guarantees are possible:
- At most once: Messages are immediately transferred. If the transfer succeeds, the message is never sent out again. However, many failure scenarios can cause lost messages (ex: no more write capacity)
- At least once: Each message is delivered at least once. In failure cases, messages may be delivered twice.
- Exactly once: Each message is delivered once and only once. This is what people want.
关于几种 delivery guarantees 的说明:
If the system “can’t lose a single event”, and must also transfer “exactly once”, then the system must stop processing events when it runs out of write capacity. The proper approach would be to use synchronous logging and return errors when the event cannot be accepted.
如果你的系统无法接受“丢失任何一条消息”,并且只允许发送消息“一次”,那么你应该使用同步日志系统;
That’s why
td-agent
guarantees ‘At most once’ transfer. In order to collect massive amounts of data without impacting application performance, a data logger must transfer data asynchronously. Performance improves at the cost of potential delivery failure.
td-agent
使用的是 At most once 模式,主要考虑采集大量日志数据,又不能影响应用性能,故只能使用异步方式进行数据传输;性能的提升的代价就是存在消息投递失败可能;
However, most failure scenarios are preventable. The following sections describe how to set up td-agent’s topology for high availability.
需要知道的是,大部分失败场景都是可以避免的;下面的内容就是针对如何构建 td-agent 的 HA ;
To configure
td-agent
for high availability, we assume that your network consists of ‘log forwarders’ and ‘log aggregators’.
td-agent
的高可用拓扑中包含了 log forwarders 和 log aggregators 组件;
‘log forwarders’ are typically installed on every node to receive local events. Once an event is received, they forward it to the ‘log aggregators’ through the network.
log forwarders 在每一个 node 上都安装,用于接收本地 event ,再转发给 log aggregators ;
‘log aggregators’ are daemons that continuously receive events from the log forwarders. They buffer the events and periodically upload the data into the cloud.
log aggregators 作为 daemon 持续接收来自 log forwarders 的 events ;log aggregators 会对 events 进行缓存,并周期性的上传数据到云端;
td-agent
can act as either a log forwarder or a log aggreagator, depending on its configuration. The next sections describes the setups. We assume that the active log aggregator has ip ‘192.168.0.1’ and that the backup has ip ‘192.168.0.2’.
td-agent
基于配置文件化身为 log forwarder 或 log aggreagator 来运行;
略
略
When a log forwarder receives events from applications, the events are first written into a disk buffer (specified by
buffer_path
). After everyflush_interval
, the buffered data is forwarded to aggregators.
log forwarder 从应用处收到 events 后,会先将其写入 disk buffer ,之后再以 flush_interval
为间隔,周期性的转发给 log aggregators ;
This process is inherently robust against data loss. If a log forwarder’s
td-agent
process dies, the buffered data is properly transferred to its aggregator after it restarts. If the network between forwarders and aggregators breaks, the data transfer is automatically retried. That being said, possible message loss scenarios do exist:
- The process dies immediately after receiving the events, but before writing them into the buffer.
- The forwarder’s disk is broken, and the file buffer is lost.
存在两种可能导致消息丢失的情况;
When log aggregators receive events from log forwarders, the events are first written into a disk buffer (specified by
buffer_path
). After everyflush_interval
, the buffered data is uploaded into the cloud.This process is inherenty robust against data loss. If a log aggregator’s
td-agent
process dies, the data from the log forwarder is properly retransferred after it restarts. If the network between aggregators and the cloud breaks, the data transfer is automatically retried.
That being said, possible message loss scenarios do exist:
存在两种可能导致消息丢失的情况;
td-agent
has a built-in monitoring agent to retrieve internal metrics in JSON via HTTP. Please add the following lines to your configuration file.
td-agent
通过一个内置的监控 agent 提供了基于 HTTP 的、JSON 个格式的 metrics 的获取;
<source>
type monitor_agent
bind 0.0.0.0
port 24220
</source>
disable_node_info (default true): Send system metrics, CPU / Memory / Disk, or not.
Next, please restart the agent and get the metrics via HTTP.
通过 HTTP 方式就可以获取相应的 metrics 了
$ curl http://host:24220/api/plugins.json
{"plugins":[{"plugin_id":"object:3fec669d6ac4","type":"forward","output_plugin":false,"config":{"type":"forward"}},{"plugin_id":"object:3fec669daf98","type":"http","output_plugin":false,"config":{"type":"http","port":"8888"}},{"plugin_id":"object:3fec669dfa48","type":"monitor_agent","output_plugin":false,"config":{"type":"monitor_agent","port":"24220"}},{"plugin_id":"object:3fec66a52e94","type":"debug_agent","output_plugin":false,"config":{"type":"debug_agent","port":"24230"}},{"plugin_id":"object:3fec66ae3dcc","type":"stdout","output_plugin":true,"config":{"type":"stdout"}},{"plugin_id":"object:3fec66aead48","type":"forward","output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":0,"retry_count":0,"config":{"type":"forward","host":"192.168.0.11"}}]}%
Additionally,
td-agent
works with monitoring tools such as Prometheus, Datadog, etc. Our recommendation is to use Prometheus since we will be collaborating more in the future under the CNCF (Cloud Native Computing Foundation).
td-agent
还支持通过监控工具获取 metrics 信息,推荐使用 Prometheus ;
Two ruby processes (parent and child) are executed. Please make sure that these processes are running.
For td-agent on Linux, you can check the process statuses with the following command. Two processes should be shown if there are no issues.
$ ps w -C ruby -C td-agent --no-heading
32342 ? Sl 0:00 /usr/lib/fluent/ruby/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log
32345 ? Sl 0:01 /usr/lib/fluent/ruby/bin/ruby /usr/sbin/td-agent --daemon /var/run/td-agent/td-agent.pid --log /var/log/td-agent/td-agent.log
td-agent
opens the following ports by default. We recommend checking the availability of these ports.
If you don’t send any data, the daemon doesn’t do anything.
A debug port for local communication is also opened.
Treasure Data, Inc. maintains stable packages for Fluentd and canonical plugins as Treasure Agent (the package is called
td-agent
). td-agent has v2 and v3. td-agent v2 for the production and v3 is the new stable version for working with ruby 2.4 and fluetnd v1 series.
Treasure Data 公司基于开源的 Fluentd 维护了 stable packages ,并开发了一个 plugin 叫做 Treasure Agent (td-agent
)
Fluentd is written in Ruby for flexibility, with performance sensitive parts written in C. However, some users may have difficulty installing and operating a Ruby daemon.
Fluentd 基于 Ruby 和 C 编写;一些用户在安装和运行 Ruby daemon 类程序时有困难;
That’s why Treasure Data, Inc is providing the stable distribution of Fluentd, called
td-agent
. The differences between Fluentd andtd-agent
can be found here.
Please follow the Preinstallation Guide to configure your OS properly. This will prevent many unnecessary problems.
For Ubuntu, we currently support “Ubuntu 18.04 LTS / Bionic 64bit”, “Ubuntu 16.04 LTS / Xenial 64bit”, “Ubuntu 14.04 LTS / Trusty 64bit”. For Debian, we currently support “Debian 9 Stretch 64bit”, “Debian 8 Jessie 64bit”.
A shell script is provided to automate the installation process for each version. The shell script registers a new apt repository at
/etc/apt/sources.list.d/treasure-data.list
and installs thetd-agent
deb package.For Ubuntu Bionic,
针对 Ubuntu 18.04 的安装命令
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-bionic-td-agent3.sh | sh
The /lib/systemd/system/td-agent.service
script is provided to start, stop, or restart the agent.
root@proxy-beijing:~# cat /lib/systemd/system/td-agent.service
[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target
[Service]
User=td-agent -- 这里后面被我调整成了 root ,否则在读写取一些目录下的文件时,会遇到权限问题
Group=td-agent -- 同上,改为 root
LimitNOFILE=65536 -- 这里已经直接调整了配置,所以基于 systemd 启动服务的话,无需调整 ulimit
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_OPTIONS=
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120
[Install]
WantedBy=multi-user.target
root@proxy-beijing:~#
运行状态查看
root@proxy-beijing:~# systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
Loaded: loaded (/lib/systemd/system/td-agent.service; disabled; vendor preset: enabled)
Active: active (running) since Thu 2019-03-07 16:59:08 CST; 17min ago
Docs: https://docs.treasuredata.com/articles/td-agent
Process: 19432 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 19460 (fluentd)
Tasks: 11 (limit: 4915)
CGroup: /system.slice/td-agent.service
├─19460 /opt/td-agent/embedded/bin/ruby /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
└─19467 /opt/td-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --under-superviso
Mar 07 16:59:07 proxy-beijing systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
Mar 07 16:59:08 proxy-beijing systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.
root@proxy-beijing:~#
If you want to customize systemd behaviour, put your
td-agent.service
into/etc/systemd/system
想要自己定制化 systemd 行为时,才需要将调整后的 td-agent.service
放入 /etc/systemd/system
目录下;
By default,
/etc/td-agent/td-agent.conf
is configured to take logs from HTTP and route them to stdout (/var/log/td-agent/td-agent.log
). You can post sample log records using the curl command.
$ curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
效果
root@proxy-beijing:/opt/apps# tail -f /var/log/td-agent/td-agent.log
2019-03-07 16:59:08 +0800 [info]: adding match pattern="td.*.*" type="tdlog"
2019-03-07 16:59:08 +0800 [warn]: #0 [output_td] secondary type should be same with primary one primary="Fluent::Plugin::TreasureDataLogOutput" secondary="Fluent::Plugin::FileOutput"
2019-03-07 16:59:08 +0800 [info]: adding match pattern="debug.**" type="stdout"
2019-03-07 16:59:08 +0800 [info]: adding source type="forward"
2019-03-07 16:59:08 +0800 [info]: adding source type="http"
2019-03-07 16:59:08 +0800 [info]: adding source type="debug_agent"
2019-03-07 16:59:08 +0800 [info]: #0 starting fluentd worker pid=19467 ppid=19460 worker=0
2019-03-07 16:59:08 +0800 [info]: #0 [input_debug_agent] listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine" worker=0
2019-03-07 16:59:08 +0800 [info]: #0 [input_forward] listening port port=24224 bind="0.0.0.0"
2019-03-07 16:59:08 +0800 [info]: #0 fluentd worker is now running worker=0
2019-03-07 17:12:16.088190324 +0800 debug.test: {"json":"message"}
公司文档和项目:
文章:
插件(下面的内容主要是 http output 相关的):