splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

If unable to write to hdfs, we should have both a dashboard for such errors, and also an alert people can enable #63

Open borischen opened 12 years ago

borischen commented 12 years ago

Sample from splunkd.log

08-31-2012 15:18:59.384 -0700 ERROR ExecProcessor - message from "/home/boris/hadoopish2/splunk/etc/apps/shuttl/bin/start.sh" 2012-08-31 15:18:59,383 ERROR com.splunk.shuttl.archiver.archive.ArchiveBucketTransferer: did="attempted to transfer bucket to archive" happened="IOException raised" expected="success" bucket="Bucket [format=SPLUNK_BUCKET, directory=/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335803939_1333339502_2, indexName=mytestdb, bucketName=db_1335803939_1333339502_2, uri=file:/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335803939_1333339502_2/]" destination="hdfs://installtest-horton1-TA.sv.splunk.com:8020/archiver_root/archive_data/mycluster/hiro/mytestdb/db_1335803939_1333339502_2/SPLUNK_BUCKET" exception="org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=boris, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x" 08-31-2012 15:18:59.526 -0700 ERROR ExecProcessor - message from "/home/boris/hadoopish2/splunk/etc/apps/shuttl/bin/start.sh" 2012-08-31 15:18:59,526 ERROR com.splunk.shuttl.archiver.archive.ArchiveBucketTransferer: did="attempted to transfer bucket to archive" happened="IOException raised" expected="success" bucket="Bucket [format=SPLUNK_BUCKET, directory=/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335820725_1333375238_3, indexName=mytestdb, bucketName=db_1335820725_1333375238_3, uri=file:/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335820725_1333375238_3/]" destination="hdfs://installtest-horton1-TA.sv.splunk.com:8020/archiver_root/archive_data/mycluster/hiro/mytestdb/db_1335820725_1333375238_3/SPLUNK_BUCKET" exception="org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=boris, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x" 08-31-2012 15:18:59.612 -0700 ERROR ExecProcessor - message from "/home/boris/hadoopish2/splunk/etc/apps/shuttl/bin/start.sh" 2012-08-31 15:18:59,611 ERROR com.splunk.shuttl.archiver.archive.ArchiveBucketTransferer: did="attempted to transfer bucket to archive" happened="IOException raised" expected="success" bucket="Bucket [format=SPLUNK_BUCKET, directory=/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335833170_1333565851_4, indexName=mytestdb, bucketName=db_1335833170_1333565851_4, uri=file:/home/boris/shuttl_archiver/data/safe-buckets/mytestdb/db_1335833170_1333565851_4/]" destination="hdfs://installtest-horton1-TA.sv.splunk.com:8020/archiver_root/archive_data/mycluster/hiro/mytestdb/db_1335833170_1333565851_4/SPLUNK_BUCKET" exception="org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=boris, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x"

petterik commented 12 years ago

It should be viewable from dashboard /app/shuttl/failed It seems like that dashboard is broken however.

@antonjn: I made shuttl fail but I didn't get any result of the search. I inspected the search and got results if I removed: "| search protocol= host= port= indexName= bucketType=" from it. The doesn't match an event that has no value for that key. I.e indexName= doesn't match events that doesn't have the indexName field. So when the user choses \ for a key, then that key shouldn't be inserted to the search.

chunmingpoxiao commented 11 years ago

@petterik when i restart splunk, I can see the buckets in /root/shuttl_archiver/data/safe-buckets/main and lock files in /root/shuttl_archiver/data/archive-locks-dir but they are never moved into HDFS.I want to recovery buckets to Splunk,how can i do?

petterik commented 11 years ago

@chunmingpoxiao: When you restart Splunk, Splunk might try to "roll" a bucket from cold to frozen by invoking the coldToFrozenScript. If the roll to frozen happens when Splunk is starting up, it's very likely that the Shuttl server hasn't started yet. The buckets that are going to be frozen will be moved to the "safe-buckets" directory, but not transferred to HDFS. However, the buckets in the "safe-bucket" directory will be transferred later. Everytime Shuttl transfers a bucket, it tries to transfer all the buckets that are in the "safe-bucket" directory. So when Splunk tries to "roll" another bucket to frozen, it'll transfer all the buckets in the "safe-buckets" directory to HDFS.

Issue #84 will make this less confusing by retrying to transfer the buckets in "safe-buckets" more often than everytime Splunk rolls a bucket to frozen.

chunmingpoxiao commented 11 years ago

@petterik you say "the buckets in the "safe-bucket" directory will be transferred later",when i try again,Shuttl transfers a bucket but not transfer all the buckets in "safe-bucket" directory.i don't know how can i do....... can you read chiness?

petterik commented 11 years ago

Can you see if you can find anything strange or any errors in $SPLUNK_HOME/var/logs/splunk/shuttl.log

Try this:

  1. Stop Splunk
  2. Run ps -ef | grep shuttl
  3. Kill the shuttl process (if there is one)
  4. Start Splunk
  5. Feed Splunk with data, so that it will eventually call the coldToFrozenScript. Or move a bucket that is in the safe-buckets directory to the cold directory of it's index.
  6. Buckets should start transferring, unless there's an error.

I can't read chinese, sorry.

chunmingpoxiao commented 11 years ago

@petterik (^_^)....today i use Version: 0.8.2...it's different from previous versions。。i don't know how to configure ...

petterik commented 11 years ago

@chunmingpoxiao you have examples on how to configure shuttl here: https://github.com/splunk/splunk-shuttl/tree/master/examples

chunmingpoxiao commented 11 years ago

@petterik QQ 20130420095549

file:index.conf version:0.8.2 configure like this,,is right?? [summer_try] homePath = $SPLUNK_HOME/var/lib/splunk/summer_try/db coldPath = $SPLUNK_HOME/var/lib/splunk/summer_try/colddb thawedPath = $SPLUNK_HOME/var/lib/splunk/summer_try/thaweddb coldToFrozenScript = $SPLUNK_HOME/etc/apps/shuttl/bin/archiveBucket.sh summer_try

i can find "db_1366128000_1366128000_22" in /root/shuttl_archiver/data/safe-buckets/summer_try...

the buckets are no move to hdfs....

chunmingpoxiao commented 11 years ago

@petterik all the buckets in the "safe-bucket" directory.. version:0.8.2

petterik commented 11 years ago

@chunmingpoxiao: It seems like you're using 0.6.1.1 in your screenshot of cd bin/ ll: where I see shuttl-server-0.6.1.1.jar

Delete your $SPLUNK_HOME/etc/apps/shuttl and download the latest release from http://splunk-base.splunk.com/apps/58003/shuttl

Here's what my shuttl/bin directory looks like for 0.8.2: Screen Shot 2013-04-22 at 12 27 05 PM

chunmingpoxiao commented 11 years ago

@petterik I can't download it ...

petterik commented 11 years ago

@chunmingpoxiao: You need to create a splunkbase user before downloading anything from Splunkbase. If that doesn't work, then you can the latest version of shuttl by building it yourself:

  1. Clone the shuttl git repository
  2. Run buildit.sh
  3. Extract build/shuttl.tgz to $SPLUNK_HOME/etc/apps/
chunmingpoxiao commented 11 years ago

@petterik thanks ..

petterik commented 11 years ago

@chunmingpoxiao: Please note that version 0.8.2 of Shuttl on Splunkbase was faulty. A new release 0.8.3.1 has been released on Splunkbase. You should download that instead.

I hope everything works out. Please contact me if you're still having problems.