nukemberg / logstash-hdfs

Logstash HDFS plugin
Apache License 2.0
46 stars 12 forks source link

LoadError: load error: manticore -- java.lang.NoSuchFieldError: INSTANCE caused by output in hdfs and elastic search at the same time #12

Open Yingmin-Li opened 9 years ago

Yingmin-Li commented 9 years ago

I am working with Logstash-1.5.3 with twitter plugin under mapr 5.0, it worked well. I then installed logstash-hdfs, and reconfig logstash.conf to save tweets only in hdfs, it works also.

But when I put two outputs (elasticsearch and hdfs) together and restart logstash, it throughout the error below:

I think the error is caused by manticore conflict used by elasticsearchoutput an hdfs output. Is there a simple way to solve it? Thank you!

LoadError: load error: manticore -- java.lang.NoSuchFieldError: INSTANCE
        require at org/jruby/RubyKernel.java:1072
        require at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/polyglot-0.3.5/lib/polyglot.rb:65
         (root) at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/transport/http/manticore.rb:1
        require at org/jruby/RubyKernel.java:1072
        require at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/polyglot-0.3.5/lib/polyglot.rb:65
         (root) at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch/protocol.rb:1
     initialize at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch/protocol.rb:58
            map at org/jruby/RubyArray.java:2412
       register at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch.rb:423
           each at org/jruby/RubyArray.java:1613
       register at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-1.0.5-java/lib/logstash/outputs/elasticsearch.rb:419
  start_outputs at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/pipeline.rb:164
            run at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/pipeline.rb:83
        execute at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/agent.rb:155
           call at org/jruby/RubyProc.java:271
            run at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/runner.rb:91
           call at org/jruby/RubyProc.java:271
            run at /opt/elasticsearch/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/runner.rb:96

my configuration for logstash is:

export ELASTICSEARCH_HOME=/opt/elasticsearch
export LOGSTASH_HOME=$ELASTICSEARCH_HOME/logstash-1.5.3
export LD_LIBRARY_PATH="$HADOOP_HOME/lib/native" 
export CLASSPATH=$CLASSPATH:$(find $HADOOP_HOME/share/hadoop/common/lib/ -name '*.jar' | grep -v sources | tr '\n' ':'):$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.0-mapr-1506.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0-mapr-1506.jar:$HADOOP_HOME/etc/hadoop

pushd  $LOGSTASH_HOME

setsid  bin/logstash agent -f logstash.conf >/tmp/logstash.log 2>&1 < /tmp/logstash.log & popd

and logstash.log content is:

twitter {
      consumer_key => "xx"
      consumer_secret => "xx"
      oauth_token => "xx"
      oauth_token_secret => "xx"
      keywords => [ "keword1", "keyword2" ]
      full_tweet => true
  }
output {
  stdout { codec => dots }
  elasticsearch {
    protocol => "http"
    host => "xxx.xxx.xxx.xxx"
    index => "twitter"
    document_type => "tweet"
    template => "twitter_template.json"
    template_name => "twitter"
  }
hdfs {
    path => "/path/in/hdfs/tweet.log"
    hadoop_config_resources => ['$HADOOP_HOME/etc/hadoop/hdfs-site.xml', '$HADOOP_HOME/etc/hadoop/core-site.xml']
    enable_append => true
  }
}
nukemberg commented 9 years ago

you are building the classpath using the catch-all expression find $HADOOP_HOME/share/hadoop/common/lib/ -name '*.jar' which includes too many jar, specifically httpcore which is used by manticore. please use a specific list of necessary jars like so:

CLASSPATH=$HADOOP_DIR/share/hadoop/common/lib/htrace-core-3.0.4.jar:$HADOOP_DIR/share/hadoop/common/lib/protobuf-java-2.5.0.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_DIR/share/hadoop/common/lib/slf4j-api-1.7.5.jar:$HADOOP_DIR/share/hadoop/common/lib/hadoop-auth-2.6.0.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-lang-2.6.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-configuration-1.6.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-collections-3.2.1.jar:$HADOOP_DIR/share/hadoop/common/lib/guava-11.0.2.jar:$HADOOP_DIR/share/hadoop/common/lib/commons-logging-1.1.3.jar:$HADOOP_DIR/share/hadoop/hdfs/hadoop-hdfs-2.6.0.jar:$HADOOP_DIR/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_DIR/etc/hadoop /opt/logstash-1.5.2/bin/logstash agent -f /path/to/config

I will update the README to make sure other people don't stumble on this

Yingmin-Li commented 9 years ago

Thank you for your kind reply.

I correct the classpath according to you suggestion. Now I don't see error message in logstash, but I data is written in linux filesystem, in stead of in HDFS. I modified the hdfs path in output as maprfs://localhost:7222/path/in/hdfs/tweet-%{timestamp}.log. It did not work.

Any suggestion?

PS, I did not allow append in HDFS, so I config hdfs output file pattern with timestamp. and %{timestamp} field exists. I supposed it should have worked.

Start command:

LD_LIBRARY_PATH="$HADOOP_HOME/lib/native" LS_JAVA_OPTS="-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf" CLASSPATH=$HADOOP_HOME/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:$HADOOP_HOME/share/hadoop/common/lib/protobuf-java-2.5.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar:$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.5.jar:$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.0-mapr-1506.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.1.jar:$HADOOP_HOME/share/hadoop/common/lib/guava-13.0.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.0-mapr-1506.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0-mapr-1506.jar:$HADOOP_HOME/etc/hadoop  ./bin/logstash agent -f logstash.conf 

and with logstash.conf below:

twitter {
      consumer_key => "xx"
      consumer_secret => "xx"
      oauth_token => "xx"
      oauth_token_secret => "xx"
      keywords => [ "keword1", "keyword2" ]
      full_tweet => true
  }
output {
  stdout { codec => dots }
  elasticsearch {
    protocol => "http"
    host => "xxx.xxx.xxx.xxx"
    index => "twitter"
    document_type => "tweet"
    template => "twitter_template.json"
    template_name => "twitter"
  }
hdfs {
    path => "/path/in/hdfs/tweet-%{+YYYY-MM-dd HH:mm:ss}.json"
    hadoop_config_resources => ['path_to/hdfs-site.xml', 'path_to/core-site.xml']
  }
}

The log of logstash is:

'[DEPRECATED] use `require 'concurrent'` instead of `require 'concurrent_ruby'`
[2015-08-21 16:52:22.324]  WARN -- Concurrent: [DEPRECATED] Java 7 is deprecated, please use Java 8.
Java 7 support is only best effort, it may not work. It will be removed in next release (1.0).
{:timestamp=>"2015-08-21T16:52:22.444000+0200", :message=>"hdfs plugin is using the 'milestone' method to declare the version of the plugin this method is deprecated in favor of declaring the version inside the gemspec.", :level=>:warn}
..........................................................................................................................................................................................

The content of path_to/hdfs-site.xml is

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
       <name>dfs.support.append</name>
       <value>true</value>
    </property>
</configuration>

The content of path_to/core-site.xml is

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
  <name>hadoop.proxyuser.mapr.hosts</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.mapr.groups</name>
  <value>*</value>
</property>

</configuration>
nukemberg commented 9 years ago

path => "/path/in/hdfs/tweet-%{timestamp}.log" is probably a bad idea as HDFS is not well suited to many small files. Probably %{+YYYY.MM.DD-HH} is better - i assume you are using %{timestamp} for testing. The path shouldn't be maprfs://localhost:7222/path/in/hdfs/tweet-%{timestamp}.log but rather /path/in/hdfs/tweet-%{timestamp}.log - the reference to the filesystem is defined in core-site.xml in the property fd.default.name. Verify the configuration files and that hadoop_config_resources refers to them (or just omit it and make sure they are in the classpath).

In any case, i've never actually tested with maprfs, but it should work in principle.