Open Yingmin-Li opened 9 years ago
you are building the classpath using the catch-all expression find $HADOOP_HOME/share/hadoop/common/lib/ -name '*.jar'
which includes too many jar, specifically httpcore which is used by manticore. please use a specific list of necessary jars like so:
CLASSPATH=$HADOOP_DIR/share/hadoop/common/lib/htrace-core-3.0.4.jar:$HADOOP_DIR/share/hadoop/common/lib/protobuf-java-2.5.0.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_DIR/share/hadoop/common/lib/slf4j-api-1.7.5.jar:$HADOOP_DIR/share/hadoop/common/lib/hadoop-auth-2.6.0.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-lang-2.6.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-configuration-1.6.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-collections-3.2.1.jar:$HADOOP_DIR/share/hadoop/common/lib/guava-11.0.2.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-logging-1.1.3.jar:$HADOOP_DIR/share/hadoop/hdfs/hadoop-hdfs-2.6.0.jar:$HADOOP_DIR/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_DIR/etc/hadoop /opt/logstash-1.5.2/bin/logstash agent -f /path/to/config
I will update the README to make sure other people don't stumble on this
Thank you for your kind reply.
I correct the classpath according to you suggestion. Now I don't see error message in logstash, but I data is written in linux filesystem, in stead of in HDFS. I modified the hdfs path in output as maprfs://localhost:7222/path/in/hdfs/tweet-%{timestamp}.log. It did not work.
Any suggestion?
PS, I did not allow append in HDFS, so I config hdfs output file pattern with timestamp. and %{timestamp} field exists. I supposed it should have worked.
Start command:
LD_LIBRARY_PATH="$HADOOP_HOME/lib/native" LS_JAVA_OPTS="" CLASSPATH=$HADOOP_HOME/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:$HADOOP_HOME/share/hadoop/common/lib/protobuf-java-2.5.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar:$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.5.jar:$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.0-mapr-1506.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.1.jar:$HADOOP_HOME/share/hadoop/common/lib/guava-13.0.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.0-mapr-1506.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0-mapr-1506.jar:$HADOOP_HOME/etc/hadoop ./bin/logstash agent -f logstash.conf
and with logstash.conf below:
twitter {
consumer_key => "xx"
consumer_secret => "xx"
oauth_token => "xx"
oauth_token_secret => "xx"
keywords => [ "keword1", "keyword2" ]
full_tweet => true
output {
stdout { codec => dots }
elasticsearch {
protocol => "http"
host => ""
index => "twitter"
document_type => "tweet"
template => "twitter_template.json"
template_name => "twitter"
hdfs {
path => "/path/in/hdfs/tweet-%{+YYYY-MM-dd HH:mm:ss}.json"
hadoop_config_resources => ['path_to/hdfs-site.xml', 'path_to/core-site.xml']
The log of logstash is:
'[DEPRECATED] use `require 'concurrent'` instead of `require 'concurrent_ruby'`
[2015-08-21 16:52:22.324] WARN -- Concurrent: [DEPRECATED] Java 7 is deprecated, please use Java 8.
Java 7 support is only best effort, it may not work. It will be removed in next release (1.0).
{:timestamp=>"2015-08-21T16:52:22.444000+0200", :message=>"hdfs plugin is using the 'milestone' method to declare the version of the plugin this method is deprecated in favor of declaring the version inside the gemspec.", :level=>:warn}
The content of path_to/hdfs-site.xml is
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
The content of path_to/core-site.xml is
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
path => "/path/in/hdfs/tweet-%{timestamp}.log"
is probably a bad idea as HDFS is not well suited to many small files. Probably %{+YYYY.MM.DD-HH}
is better - i assume you are using %{timestamp}
for testing.
The path shouldn't be maprfs://localhost:7222/path/in/hdfs/tweet-%{timestamp}.log
but rather /path/in/hdfs/tweet-%{timestamp}.log
- the reference to the filesystem is defined in core-site.xml
in the property
. Verify the configuration files and that hadoop_config_resources
refers to them (or just omit it and make sure they are in the classpath).
In any case, i've never actually tested with maprfs, but it should work in principle.
I am working with Logstash-1.5.3 with twitter plugin under mapr 5.0, it worked well. I then installed logstash-hdfs, and reconfig logstash.conf to save tweets only in hdfs, it works also.
But when I put two outputs (elasticsearch and hdfs) together and restart logstash, it throughout the error below:
I think the error is caused by manticore conflict used by elasticsearchoutput an hdfs output. Is there a simple way to solve it? Thank you!
my configuration for logstash is:
and logstash.log content is: