yyyyb commented 5 years ago

Flume is a distributed, reliable, and available service for efficiently collecting（收集）, aggregating（聚集）, and moving（移动） large amounts of log data.

yyyyb commented 5 years ago

webserver(源端) ==> flume ==>HDFS（目的地）

yyyyb commented 5 years ago

Flume: Cloudera / Apache Java Scribe: Facebook C/C++ 不在维护 Chukwa: Yahoo/Apache java 不在维护 Fluentd: Ruby Logstash: ELK(ElasticSearch, Kibana)

yyyyb commented 5 years ago

Flume架构及核心组件 1.Source：收集 2.Channel：聚集 3.Sink：输出

yyyyb commented 5 years ago

Flume环境搭建：前置条件： 1.Java Runtime Environment - Java 1.8 or later 2.Memory - Sufficient memory for configurations used by sources, channels or sinks 3.Disk Space - Sufficient disk space for configurations used by channels or sinks 4.Directory Permissions - Read/Write permissions for directories used by agent 1.安装JDK 首先安装JDK 到软件目录下 tar -zxvf jdk-8uxxx-linux-x64.tar.gz -C ~/app/ 配置系统变量 vi ~/.bash_profile

export JAVA_HOME=/jdk解压目录 export PATH=$JAVA_HOME/bin:$PATH

source ~/.bash_profile

yyyyb commented 5 years ago

2.安装Flume 下载地址 http://archive.cloudera.com/cdh5/cdh/5/ 找到cdh相对应的版本下载 flume-ng-1.6.0-cdh5.7.0.tar.gz 解压到指定目录 tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C /指定目录配置环境变量 vi ~/.bash_profile

export FLUME_HOME=/flume解压目录 export PATH=$/FLUME_HOME:$PATH

source ~/.bash_profile

yyyyb commented 5 years ago

3.配置flume 配置conf目录下的flume-env.sh cp flume-env.sh.templete flume-env.sh vi flume-env.sh

JAVA_HOME=/App/jdk1.8.0_211

4.检测 bin目录下：flume-ng version

yyyyb commented 5 years ago

使用Flume的关键就是写配置文件 A）配置Source B）配置Channel C）配置Sink D）把以上三个组件串起来

yyyyb commented 5 years ago

example.conf：单节点Flume配置

官网给的示例 a1：agent名称 r1：数据源source的名称 k1：sink的名称 c1：channel的名称＃命名此代理上的组件 a1.sources = r1 a1.sinks = k1 a1.channels = c1

＃描述/配置源 a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444

＃描述接收器 a1.sinks.k1.type = logger

＃使用缓冲内存中事件的通道 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100

＃将源和接收器绑定到通道 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

yyyyb commented 5 years ago

启动agent： bin / flume-ng agent \ -n $agent_name \ (指定agent名字,e.g. -n（或者--name） a1) -c conf (指定conf目录，也就是-c（或者--conf） $FLUME_HOME/conf) -f conf / flume-conf.properties.template (指定需要执行的配置文件,e.g. -f (或者--conf-file) $FLUME_HOME/conf/example.conf) -Dflume.root.logger=INFO,console 使用telnet进行测试： telnet hadoop000 44444

Event：{ headers:{} body: 68 65 6C 6C 6F 0D hello. } Event是Flume数据传输的基本单位 Event = 可选的header + byte array

yyyyb commented 5 years ago

需求二：监听文件来传输 Agent选型： exec source + memory channel + logger sink 配置文件：＃命名此代理上的组件 a1.sources = r1 a1.sinks = k1 a1.channels = c1

＃描述/配置源 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /root/data/data.log a1.sources.r1.shell = /bin/sh -c

＃描述接收器 a1.sinks.k1.type = logger

＃使用缓冲内存中事件的通道 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100

＃将源和接收器绑定到通道 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

yyyyb commented 5 years ago

需求三：A服务器采集并发送日志到B服务器技术选型：A服务器：exec source + memory channel +avro sink B服务器：avro source +memory channel + logger sink exec-memory-avro.conf ＃命名此代理上的组件 exec-memory-avro.sources = exec-source exec-memory-avro.sinks = avro-sink exec-memory-avro.channels = memory-channel

＃描述/配置源 exec-memory-avro.sources.exec-source.type = exec exec-memory-avro.sources.exec-source.command = tail -F /root/data/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c

＃描述接收器 exec-memory-avro.sinks.avro-sink.type = avro exec-memory-avro.sinks.avro-sink.hostname = hadoop000 exec-memory-avro.sinks.avro-sink.port = 44444

＃使用缓冲内存中事件的通道 exec-memory-avro.channels.memory-channel.type = memory

＃将源和接收器绑定到通道 exec-memory-avro.sources.exec-source.channels = memory-channel exec-memory-avro.sinks.avro-sink.channel = memory-channel

avro-memory-logger.conf ＃命名此代理上的组件 avro-memory-logger.sources = avro-source avro-memory-logger.sinks = logger-sink avro-memory-logger.channels = memory-channel

＃描述/配置源 avro-memory-logger.sources.avro-source.type = avro avro-memory-logger.sources.avro-source.bind = hadoop000 avro-memory-logger.sources.avro-source.port = 44444

＃描述接收器 avro-memory-logger.sinks.logger-sink.type = logger

＃使用缓冲内存中事件的通道 avro-memory-logger.channels.memory-channel.type = memory

＃将源和接收器绑定到通道 avro-memory-logger.sources.avro-source.channels = memory-channel avro-memory-logger.sinks.logger-sink.channel = memory-channel

先启动avro-memory-logger 在启动exec-memory-avro

yyyyb / -spark-

Flume学习 #1

export JAVA_HOME=/jdk解压目录 export PATH=$JAVA_HOME/bin:$PATH

2.安装Flume 下载地址 http://archive.cloudera.com/cdh5/cdh/5/ 找到cdh相对应的版本下载 flume-ng-1.6.0-cdh5.7.0.tar.gz 解压到指定目录 tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C /指定目录配置环境变量 vi ~/.bash_profile

export FLUME_HOME=/flume解压目录 export PATH=$/FLUME_HOME:$PATH

3.配置flume 配置conf目录下的flume-env.sh cp flume-env.sh.templete flume-env.sh vi flume-env.sh

JAVA_HOME=/App/jdk1.8.0_211

example.conf：单节点Flume配置

yyyyb / -spark-

Flume学习 #1

export JAVA_HOME=/jdk解压目录 export PATH=$JAVA_HOME/bin:$PATH

2.安装Flume 下载地址 http://archive.cloudera.com/cdh5/cdh/5/ 找到cdh相对应的版本下载 flume-ng-1.6.0-cdh5.7.0.tar.gz 解压到指定目录 tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C /指定目录 配置环境变量 vi ~/.bash_profile

export FLUME_HOME=/flume解压目录 export PATH=$/FLUME_HOME:$PATH

3.配置flume 配置conf目录下的flume-env.sh cp flume-env.sh.templete flume-env.sh vi flume-env.sh

JAVA_HOME=/App/jdk1.8.0_211

example.conf：单节点Flume配置

2.安装Flume 下载地址 http://archive.cloudera.com/cdh5/cdh/5/ 找到cdh相对应的版本下载 flume-ng-1.6.0-cdh5.7.0.tar.gz 解压到指定目录 tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C /指定目录配置环境变量 vi ~/.bash_profile