Open yyyyb opened 5 years ago
webserver(源端) ==> flume ==>HDFS(目的地)
Flume: Cloudera / Apache Java Scribe: Facebook C/C++ 不在维护 Chukwa: Yahoo/Apache java 不在维护 Fluentd: Ruby Logstash: ELK(ElasticSearch, Kibana)
Flume架构及核心组件 1.Source:收集 2.Channel:聚集 3.Sink:输出
source ~/.bash_profile
source ~/.bash_profile
4.检测 bin目录下:flume-ng version
使用Flume的关键就是写配置文件 A)配置Source B)配置Channel C)配置Sink D)把以上三个组件串起来
官网给的示例 a1:agent名称 r1:数据源source的名称 k1:sink的名称 c1:channel的名称 #命名此代理上的组件 a1.sources = r1 a1.sinks = k1 a1.channels = c1
#描述/配置源 a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444
#描述接收器 a1.sinks.k1.type = logger
#使用缓冲内存中事件的通道 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100
#将源和接收器绑定到通道 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
启动agent: bin / flume-ng agent \ -n $agent_name \ (指定agent名字,e.g. -n(或者--name) a1) -c conf (指定conf目录,也就是-c(或者--conf) $FLUME_HOME/conf) -f conf / flume-conf.properties.template (指定需要执行的配置文件,e.g. -f (或者--conf-file) $FLUME_HOME/conf/example.conf) -Dflume.root.logger=INFO,console 使用telnet进行测试: telnet hadoop000 44444
Event:{ headers:{} body: 68 65 6C 6C 6F 0D hello. } Event是Flume数据传输的基本单位 Event = 可选的header + byte array
需求二:监听文件来传输 Agent选型: exec source + memory channel + logger sink 配置文件: #命名此代理上的组件 a1.sources = r1 a1.sinks = k1 a1.channels = c1
#描述/配置源 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /root/data/data.log a1.sources.r1.shell = /bin/sh -c
#描述接收器 a1.sinks.k1.type = logger
#使用缓冲内存中事件的通道 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100
#将源和接收器绑定到通道 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
需求三:A服务器采集并发送日志到B服务器 技术选型:A服务器:exec source + memory channel +avro sink B服务器:avro source +memory channel + logger sink exec-memory-avro.conf #命名此代理上的组件 exec-memory-avro.sources = exec-source exec-memory-avro.sinks = avro-sink exec-memory-avro.channels = memory-channel
#描述/配置源 exec-memory-avro.sources.exec-source.type = exec exec-memory-avro.sources.exec-source.command = tail -F /root/data/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c
#描述接收器 exec-memory-avro.sinks.avro-sink.type = avro exec-memory-avro.sinks.avro-sink.hostname = hadoop000 exec-memory-avro.sinks.avro-sink.port = 44444
#使用缓冲内存中事件的通道 exec-memory-avro.channels.memory-channel.type = memory
#将源和接收器绑定到通道 exec-memory-avro.sources.exec-source.channels = memory-channel exec-memory-avro.sinks.avro-sink.channel = memory-channel
avro-memory-logger.conf #命名此代理上的组件 avro-memory-logger.sources = avro-source avro-memory-logger.sinks = logger-sink avro-memory-logger.channels = memory-channel
#描述/配置源 avro-memory-logger.sources.avro-source.type = avro avro-memory-logger.sources.avro-source.bind = hadoop000 avro-memory-logger.sources.avro-source.port = 44444
#描述接收器 avro-memory-logger.sinks.logger-sink.type = logger
#使用缓冲内存中事件的通道 avro-memory-logger.channels.memory-channel.type = memory
#将源和接收器绑定到通道 avro-memory-logger.sources.avro-source.channels = memory-channel avro-memory-logger.sinks.logger-sink.channel = memory-channel
先启动avro-memory-logger 在启动exec-memory-avro
Flume is a distributed, reliable, and available service for efficiently collecting(收集), aggregating(聚集), and moving(移动) large amounts of log data.