nukemberg / logstash-hdfs

Logstash HDFS plugin
Apache License 2.0
46 stars 12 forks source link

Cannot get plugin to operate on HDFS #3

Closed hakusaro closed 9 years ago

hakusaro commented 10 years ago

Following the setup instructions, I have the configuration files on the classpath, however I can't get the plugin to successfully use the distributed file system -- it seems to be stuck attempting to use the real filesystem instead:

2014-01-05 15:53:42,057 WARN  conf.Configuration (Configuration.java:loadProperty(2172)) - core-site.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
Using Hadoop configuration: file:/// {:level=>:info, :file=>"logstash/outputs/hdfs.rb", :line=>"59"}

I'm executing the following command:

CLASSPATH=$(find /Users/shank/Development/mercury/hadoop-2.2.0 -name '*.jar' | tr '\n' ':'):/Users/shank/Development/mercury/hadoop-2.2.0/etc/hadoop/:/Users/shank/Development/mercury/logstash-1.3.2-flatjar.jar java logstash.runner agent -f /Users/shank/Development/mercury/logstash-complex.conf -p /Users/shank/Development/mercury/logstash-hdfs -vv

The Logstash output is configured as:

output {
  hdfs {
      path => "log2.log"
      flush_interval => 0
  }
}

core-site.xml:

<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000/</value>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000/</vaue>
</property>
</configuration>

hdfs-site.xml:

<configuration>
<property>  
   <name>dfs.replication</name>  
   <value>1</value>  
</property>  
<property>  
   <name>dfs.namenode.name.dir</name>  
   <value>file:/Users/shank/Development/mercury/hadoop/namenode</value>  
</property>  
<property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/shank/Development/mercury/hadoop/datanode</value>  
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000/</vaue>
</property>
</configuration>

I've repeatedly tried specifying fs.defaultFS in different config files, but I cannot get it to be read. My hunch is that these config files aren't getting on the classpath, but how would I go about rectifying that or debugging it?

nukemberg commented 10 years ago

I think you may be missing /conf at the end of your config path, afair hadoop uses this etc/conf path. Also you could try specifying the config file path explicitly using the config_file parameter. On Jan 6, 2014 1:00 AM, "Lucas Nicodemus" notifications@github.com wrote:

Following the setup instructions, I have the configuration files on the classpath, however I can't get the plugin to successfully use the distributed file system -- it seems to be stuck attempting to use the real filesystem instead:

2014-01-05 15:53:42,057 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - core-site.xml:an attempt to override final parameter: hadoop.tmp.dir; Ignoring. Using Hadoop configuration: file:/// {:level=>:info, :file=>"logstash/outputs/hdfs.rb", :line=>"59"}

I'm executing the following command:

CLASSPATH=$(find /Users/shank/Development/mercury/hadoop-2.2.0 -name '*.jar' | tr '\n' ':'):/Users/shank/Development/mercury/hadoop-2.2.0/etc/hadoop/:/Users/shank/Development/mercury/logstash-1.3.2-flatjar.jar java logstash.runner agent -f /Users/shank/Development/mercury/logstash-complex.conf -p /Users/shank/Development/mercury/logstash-hdfs -vv

The Logstash output is configured as:

output { hdfs { path => "log2.log" flush_interval => 0 } }

core-site.xml:

fs.default.name hdfs://localhost:9000/ fs.defaultFS hdfs://localhost:9000/

hdfs-site.xml:

dfs.replication 1 dfs.namenode.name.dir file:/Users/shank/Development/mercury/hadoop/namenode dfs.datanode.data.dir file:/Users/shank/Development/mercury/hadoop/datanode fs.defaultFS hdfs://localhost:9000/

I've repeatedly tried specifying fs.defaultFS in different config files, but I cannot get it to be read. My hunch is that these config files aren't getting on the classpath, but how would I go about rectifying that or debugging it?

— Reply to this email directly or view it on GitHubhttps://github.com/avishai-ish-shalom/logstash-hdfs/issues/3 .

hakusaro commented 10 years ago

Hadoop 2.x uses etc/hadoop for its config file storage location.

# shank at argus-mobile in ~/Development/mercury/hadoop-2.2.0/etc [16:16:31]
$ ls
hadoop

# shank at argus-mobile in ~/Development/mercury/hadoop-2.2.0/etc [16:16:31]
$ cd hadoop

# shank at argus-mobile in ~/Development/mercury/hadoop-2.2.0/etc/hadoop [16:16:33]
$ ls
capacity-scheduler.xml     hadoop-env.cmd             hadoop-policy.xml          httpfs-signature.secret    mapred-env.sh              ssl-client.xml.example     yarn-site.xml
configuration.xsl          hadoop-env.sh              hdfs-site.xml              httpfs-site.xml            mapred-queues.xml.template ssl-server.xml.example
container-executor.cfg     hadoop-metrics.properties  httpfs-env.sh              log4j.properties           mapred-site.xml.template   yarn-env.cmd
core-site.xml              hadoop-metrics2.properties httpfs-log4j.properties    mapred-env.cmd             slaves                     yarn-env.sh

Which file would I specific directly, core-site.xml or hdfs-site.xml?

hakusaro commented 10 years ago

There are typos in my original xml files, but I can assure you that with the typo on vaue fixed, it still does not read the property.

nukemberg commented 10 years ago

try setting the path explicitly:

output { hdfs { path => "/path/to/output_file.log" hadoop_config_resources => ['path/to/configuration/on/classpath/core-site.xml'] } }

you can specify multiple config file location if you want.

On Mon, Jan 6, 2014 at 1:39 AM, Lucas Nicodemus notifications@github.comwrote:

There are typos in my original xml files, but I can assure you that with the typo on vaue fixed, it still does not read the property.

— Reply to this email directly or view it on GitHubhttps://github.com/avishai-ish-shalom/logstash-hdfs/issues/3#issuecomment-31619434 .

wDomin commented 10 years ago

Hi, I'm having the exact same problem, I specified the path to core-site.xml and hdfs-site.xml but the plugin still won't try to write on the hdfs rather than the local file system.

my launch command:

CLASSPATH=$(find /usr/lib/hadoop -name '*.jar' | tr '\n' ':')/etc/hadoop/conf/core-site.xml:/etc/hadoop/conf/hdfs-site.xml:/applis/hadd/hdp21/logstash-1.3.2/logstash-1.3.2/logstash-1.3.2-flatjar.jar java logstash.runner agent -f /applis/hadd/POC_ELK_PDT_CVG/real_time/TEST_LOAD_DATA_HDFS.conf -p logstash-hdfs-master

and my logstash output:

output { hdfs { path => "/applis/hadd/test/axp/output_file.log" hadoop_config_resources => ['/etc/hadoop/conf/hdfs-site.xml'] enable_append => true } }

Am I doing something wrong?

nukemberg commented 10 years ago

you are missing : before /etc/hadoop/conf/core-site.xml. also, you don't need to specify the xml files on the classpath, only the directory containing them, e.g. /etc/hadoop/conf.

wDomin commented 10 years ago

I fixed that, now it's not writing onto my local FS anymore, but it still won't write on my hdfs. I'm receiving 2 warnings at the begining of execution, maybe it's related? 2014-07-03 10:52:01,173 - WARN [>output:NativeCodeLoader@62] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-07-03 10:52:01,759 - WARN [>output:DomainSocketFactory@111] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

nukemberg commented 10 years ago

can you give more information? what version of logstash are you using? what version of hadoop?

wDomin commented 10 years ago

I'm using logstash-1.4.1 with the hortonworks distrib of hadoop 2.1

nukemberg commented 9 years ago

Can you try running with LD_LIBRARY_PATH as per the updated instructions on the README?