issues
search
splunk
/
splunk-shuttl
Splunk app for archive management, including HDFS support.
Apache License 2.0
36
stars
19
forks
source link
MR job for loading CSV data into HBase tables
#96
Open
petterik
opened
11 years ago
petterik
commented
11 years ago
Create a MapReduce job that:
Given an index (splunk index), get all the csv files for all the indexers
Load those csv files to an hbase table that's unique per index.
Given scenario:
Job was run successfully and HBase contains all the data.
New CSV data is Shuttl'ed to hdfs
Job is run again then: Only the new data is appended to the HBase tables (for the correct indexes).
Other criterias:
The job needs to be reliable. What happens
when
things go wrong.
Create a MapReduce job that:
Given scenario:
Other criterias: