MR job for loading CSV data into HBase tables - Githubissues

splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.

Apache License 2.0

36 stars 19 forks source link

MR job for loading CSV data into HBase tables #96

Open petterik opened 11 years ago

petterik commented 11 years ago

Create a MapReduce job that:

Given an index (splunk index), get all the csv files for all the indexers
Load those csv files to an hbase table that's unique per index.

Given scenario:

Job was run successfully and HBase contains all the data.
New CSV data is Shuttl'ed to hdfs
Job is run again then: Only the new data is appended to the HBase tables (for the correct indexes).

Other criterias:

The job needs to be reliable. What happens when things go wrong.