kzk / webhdfs

Ruby client for Hadoop WebHDFS
Other
81 stars 46 forks source link

writing to webHdfs in parquet/avro format #33

Closed chitenderkumar closed 6 years ago

chitenderkumar commented 6 years ago

Hi,

we have use case where we want to read from the Elasticsearch and write into HDFS. for this we are using WebHdfs output plugin in Logstash. below is our logstash config for reference.

input { elasticsearch { hosts => "192.168.0.3" index => "test" query => '{"query": {"term": {"Name": "test"}}}' size => 500 scroll => "5m" } } output { webhdfs { host => "192.168.0.2" port => 50070 # (required) path => "/user/logstash/test1" # (required) user => "hdfs" # (required) flush_size => 500 idle_flush_time => 10 retry_interval => 10 codec => json } }

which is working fine for us. now we have a requirement where we want to write output in parquet/avro format in HDFS.

is there any config parameter by which we can write data in HDFS in Avro or Parquet format.

tagomoris commented 6 years ago

It's the question about how to use Logstash, or an issue about logstash output plugin for webhdfs (if the feature is missing). It's not about this gem. See other repository or discussion forums.