splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

Shuttl with S3 #112

Closed justinlundy closed 11 years ago

justinlundy commented 11 years ago

I'm looking to use Shuttl with S3, without a Hadoop cluster. I've got Splunk 5.0.1 running and configured some test indexes for the purposes of evaluating shuttl.

Built and installed the app, followed the documentation as much as possible. Confirmed S3 bucket policy is configured properly -- AWS keys work etc.

The buckets are being copied into /mnt/raid/shuttl_archiver -- but they are not making it into S3. Despite the fact that I configured the backendName to be "s3".

What is confusing to me is the following warning -- it looks like it's trying to use HDFS?

2012-12-19 00:43:02,475 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Fsplunk_archive%2Farchive_data%2Fcluster_name%2Fsplunk-us-1' - Unexpected response code 404, expected 200 2012-12-19 00:43:02,475 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Fsplunk_archive%2Farchive_data%2Fcluster_name%2Fsplunk-us-1' - Received error response with XML message

Here is a snippet from my archivers.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:archiverConf xmlns:ns2="com.splunk.shuttl.server.model">
    <localArchiverDir>/mnt/raid/shuttl_archiver</localArchiverDir>
    <!-- Supported values for backend: local, hdfs, s3, s3n or glacier -->
    <backendName>s3</backendName>
    <!-- Path on the backend where Shuttl will store data -->
    <archivePath>splunk_archive</archivePath>
    <archiverRootURI>s3://snippet:redacted@splunk-archive-temp/splunk_archive</archiverRootURI>
    <clusterName>cluster_name</clusterName>
    <serverName>splunk-us-1</serverName>
    <archiveFormats>
        <archiveFormat>SPLUNK_BUCKET</archiveFormat>
    </archiveFormats>
</ns2:archiverConf>

Is there anything obvious I'm missing?

petterik commented 11 years ago

@tegatai You're mixing the old and new way of configuring Shuttl.

I see two things you can try (and I'll make this clearer in the README):

  1. The should not be used, instead just use , and the appropriate files in shuttl/conf/backend.
  2. The should be absolute when using s3, so put a "/" before splunk_archive, so it's <archivePath>/splunk_archive</archivePath>

When using the archiverRootURI, it should be s3://<ID>:<SECRET>@<BUCKET>/archiving_root (seen here: https://github.com/splunk/splunk-shuttl/issues/86#issuecomment-8907155) Is your amazon bucket named "splunk-archive-temp"? As that could be why it's not working now.

I'm happy to answer any questions you have, since the docs and configuration is not as usable as I want them to be. Better documentation and configurations and better docs are in the works!

Which version of Shuttl are you running? Let me know if you still have any problems!

petterik commented 11 years ago

I've made it easier to test if a configured Shuttl works. Instead of having you making a bucket roll to frozen, you can now run a script! See this comment: https://github.com/splunk/splunk-shuttl/issues/106#issuecomment-11518045

petterik commented 11 years ago

are you having any problems @tegatai? I'm closing this issue otherwise.

justinlundy commented 11 years ago

Thank you, I resolved the issue.