splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

S3 adn shuttl not working on 5.0.3, JDK7 with search head pooling #131

Open milan-koudelka opened 11 years ago

milan-koudelka commented 11 years ago

Hi, I've tried to use it, but it's not working there are errors even when T'm running test script. Test script uploaded some files, but they aren't in correct path, they're in root folder.

My configuration: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

file://mnt/tmp/shuttl_archiver s3 /test ``` common splunk-i1 SPLUNK_BUCKET ``` /ns2:archiverConf Part of log: 2013-07-30 03:23:48,935 INFO com.splunk.shuttl.archiver.filesystem.transaction.TransactionExecuter: done="Preparing transaction" transaction="Transaction [data=/mnt/tmp/shuttl_archiver/data/metadata-dir/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/bucket.size, remoteTemp=test/temporary_data/splunk-i1test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size, dst=test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size]" 2013-07-30 03:23:48,935 INFO com.splunk.shuttl.archiver.filesystem.transaction.TransactionExecuter: will="Commit transaction" transaction="Transaction [data=/mnt/tmp/shuttl_archiver/data/metadata-dir/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/bucket.size, remoteTemp=test/temporary_data/splunk-i1test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size, dst=test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size]" 2013-07-30 03:23:48,950 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Unexpected response code 404, expected 200 2013-07-30 03:23:48,950 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Received error response with XML message 2013-07-30 03:23:49,173 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta' - Unexpected response code 404, expected 200 2013-07-30 03:23:49,174 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta' - Received error response with XML message 2013-07-30 03:23:49,257 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Unexpected response code 404, expected 200 2013-07-30 03:23:49,258 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Received error response with XML message 2013-07-30 03:23:49,351 INFO com.splunk.shuttl.archiver.filesystem.transaction.TransactionExecuter: done="Commit transaction" transaction="Transaction [data=/mnt/tmp/shuttl_archiver/data/metadata-dir/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/bucket.size, remoteTemp=test/temporary_data/splunk-i1test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size, dst=test/archive_data/common/splunk-i1/TestIndexForTryingOutShuttlArchiving/db_1336330530_1336330530_0.zVdSR0/SPLUNK_BUCKET/archive_meta/bucket.size]" 2013-07-30 03:23:49,365 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Ftemporary_data%2Fsplunk-i1test%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Unexpected response code 404, expected 200 2013-07-30 03:23:49,365 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fuser%2Froot%2Ftest%2Ftemporary_data%2Fsplunk-i1test%2Farchive_data%2Fcommon%2Fsplunk-i1%2FTestIndexForTryingOutShuttlArchiving%2Fdb_1336330530_1336330530_0.zVdSR0%2FSPLUNK_BUCKET%2Farchive_meta%2Fbucket.size' - Received error response with XML message 2013-07-30 03:23:49,369 INFO com.splunk.shuttl.archiver.archive.BucketShuttlerRunner: done="Archived bucket" bucket="LocalBucket [getDirectory()=/mnt/tmp/db_1336330530_1336330530_0.zVdSR0, getName()=db_1336330530_1336330530_0.zVdSR0, getIndex()=TestIndexForTryingOutShuttlArchiving, getFormat()=SPLUNK_BUCKET, getPath()=/mnt/tmp/db_1336330530_1336330530_0.zVdSR0, getEarliest()=Sun May 06 20:55:30 CEST 2012, getLatest()=Sun May 06 20:55:30 CEST 2012, getSize()=0]"
petterik commented 11 years ago

Even though you're getting those errors, it might still work. See why here: http://splunk-base.splunk.com/answers/85635/shuttl-archiving-errors/85832 In short, Shuttl is using the same libraries as Hadoop is using for S3, and they are out dated, but should be well tested. There's a patch to update the S3 library, but it's not in a release yet.

As for the buckets not being at the root, here's my guess (which might be off by one or two):

To try:

Note:

Thanks and let me know if it doesn't work,

milan-koudelka commented 11 years ago

Hi Petter, thank you for your fast response. Data aren't really loaded to S3. Just from test, not from real process. I can't try that through UI, because I have pure indexers without UI.

In this case. When I don't have UI on indexers, it will be probably better to use directly shell script with s3cmd command.

Best regards Milan Koudelka

petterik commented 11 years ago

You can control your indexers' Shuttl apps through your Search Head as long as you have the Shuttl app installed on all Splunk instances. All your commands run on your SH will be executed on your connected Search Peers/Indexers as well.

Otherwise, if you want to test Shuttl isolated on an indexer, you can call the Shuttl server's REST endpoints for thawing. Example of calling Shuttl's thaw endpoint:

POST parameters:

where "from" and "to" parameters are optional.

milan-koudelka commented 11 years ago

That's nice ! I've installed it also on search head. Both pages Thaw and Flush are showing 404.

The path '/en-US/custom/shuttl/Archiving/show' was not found.

There are few sucessfull Archived Buckets on main page. But all of them are from Testing script :-/

petterik commented 11 years ago
  1. The 404 error with "The path '/en-US/custom/shuttl/Archiving/show' was not found." is usually because the Splunk instance hasn't been restarted since Shuttl was installed to that instance. This is because Shuttl's custom UI requires a Splunk restart. So try that!
  2. I suggest you try to Shuttl "real" buckets because to even make the buckets from the testing script to work, you need to configure a Splunk index, which equals the "testing-script-buckets" indexes. I really recommend you delete the testing buckets, asap.

To Shuttl some real Splunk buckets fast, you can do the following:

I'm going to be offline for the rest of the day. I'll help you more tomorrow!

- Petter

milan-koudelka commented 11 years ago

Hi, yes, sorry, I've restarted Splunk and these pages are accessible now. However in index field there is alway Loading.

I'm testing that on real index. I've didn't have maxWarmDBCount there, but I think that this wasn't important.

My current configuration is:

[dev-os]
homePath = $SPLUNK_DB/dev-os/db
thawedPath = $SPLUNK_DB/dev-os/thaweddb
coldPath = $SPLUNK_DB/dev-os/colddb
maxTotalDataSizeMB = 19800
maxWarmDBCount = 5
frozenTimePeriodInSecs = 604800
coldToFrozenScript = $SPLUNK_HOME/etc/apps/shuttl/bin/coldToFrozenScript.sh

In log, there is still the same alert

2013-07-30 22:50:12,410 INFO com.splunk.shuttl.archiver.archive.BucketFreezer: will="Attempting to archive bucket" index="dev-os" path="/mnt/ebs/splunk/dev-os/colddb/db_1369127814_1369119469_180"
2013-07-30 22:50:12,567 ERROR com.splunk.shuttl.archiver.model.MovesBuckets: did="Attempted to move bucket" happened="move failed" bucket="LocalBucket [getDirectory()=/mnt/ebs/splunk/dev-os/colddb/db_1369127814_1369119469_180, getName()=db_1369127814_1369119469_180, getIndex()=dev-os, getFormat()=SPLUNK_BUCKET, getPath()=/mnt/ebs/splunk/dev-os/colddb/db_1369127814_1369119469_180, getEarliest()=Tue May 21 08:57:49 CEST 2013, getLatest()=Tue May 21 11:16:54 CEST 2013, getSize()=127725]" destination="/mnt/tmp/shuttl_archiver/data/safe-buckets/dev-os"

And there aren't any new data on S3.

petterik commented 11 years ago

The move that is failing is a normal java.io.File#rename(), which should be equal to a unix mv . Since that's what's failing, can it have to do with permissions? I'm not sure why it should fail otherwise.

I'm not unsure why the index field is always Loading. Hopefully you can find something in the logs or through debugging the site in the browser.

milan-koudelka commented 11 years ago

Hi Petter, i've tried to reinstall that at all. I've installed that through manager. So manager put that app to /mnt/shared_storage/etc/apps/shuttl. That's path for search head pooling shared storage.

I've configured just splunk.xml for start and tried to restart server.

There is still same issue that Flush and Thaw pages are still writing Loading in list of indexes.

And I can see these errors in logs, even when I tried to copy these files to correct path:

08-15-2013 00:02:58.819 +0200 ERROR SearchResults - Failed to remove "/mnt/shared_storage/etc/users/milan.koudelka/shuttl/history/splunk-sh1-dev.csv.tmp": No such file or directory
08-15-2013 00:04:22.430 +0200 ERROR FrameworkUtils - Incorrect path to script: /opt/splunk/etc/apps/shuttl/bin/coldToFrozenRetry.sh.  Script must be located inside $SPLUNK_HOME/bin/scripts.
08-15-2013 00:04:22.430 +0200 ERROR ExecProcessor - Ignoring: "/opt/splunk/etc/apps/shuttl/bin/coldToFrozenRetry.sh"
08-15-2013 00:04:22.430 +0200 ERROR FrameworkUtils - Incorrect path to script: /opt/splunk/etc/apps/shuttl/bin/start.sh.  Script must be located inside $SPLUNK_HOME/bin/scripts.
08-15-2013 00:04:22.430 +0200 ERROR ExecProcessor - Ignoring: "/opt/splunk/etc/apps/shuttl/bin/start.sh"
08-15-2013 00:04:22.430 +0200 ERROR FrameworkUtils - Incorrect path to script: /opt/splunk/etc/apps/shuttl/bin/warmToColdRetry.sh.  Script must be located inside $SPLUNK_HOME/bin/scripts.
08-15-2013 00:04:22.430 +0200 ERROR ExecProcessor - Ignoring: "/opt/splunk/etc/apps/shuttl/bin/warmToColdRetry.sh"

[root@splunk-sh1-dev:/opt/splunk] echo $SPLUNK_HOME
/opt/splunk

[root@splunk-sh1-dev:/opt/splunk] cat /opt/splunk/etc/system/local/distsearch.conf

[searchhead:splunk-sh1-dev]
mounted_bundles = true
bundles_location = /mnt/shared_storage/etc/

I think that it isn't capable to run on architecture with shared storage for search head pooling. It's probably expecting that whole installation will be in $SPLUNK_HOME. But, that's not possible when I'm using search head pooling.

Do you have any advice please ?

petterik commented 11 years ago

I haven't seen the: Incorrect path to script: /opt/splunk/etc/apps/shuttl/bin/coldToFrozenRetry.sh. Script must be located inside $SPLUNK_HOME/bin/scripts.. I should fix that! I think that I might assume that the app is installed at $SPLUNK_HOME/etc/apps/shuttl, somewhere. I should probably not assume that.

My suggestions are:

milan-koudelka commented 11 years ago

It's not possible to install that app to $SPLUNK_HOME/etc/apps/shuttl If you are using search head pooling and mounted bundles, your apps in this paths are ignored are Splunk is using only apps on shared storage. I've copied these files there. But there isn't any change even after restart. The paths in inputs.conf was already as you wrote. I've change the path to the real location of application on shared storage. I've tried to rewrite all these $SPLUNK_HOME to correct path in environment with search head pooling. But it's everywhere :-/ It isn't possible to use this with search head pooling due to expectation that the app will be in $SPLUNK_HOME. Maybe I can try little hack. Create a symlink from $SPLUNK_HOME/etc/apps/shuttl to shared_storage.

milan-koudelka commented 11 years ago

With symlink, it has other errors :-)

08-16-2013 02:15:25.827 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" 16.8.2013 2:15:25 com.sun.jersey.api.core.PackagesResourceConfig init 08-16-2013 02:15:25.827 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" INFO: Scanning for root resource and provider classes in the packages: 08-16-2013 02:15:25.827 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" com.splunk.shuttl.server.mbeans.rest 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" 16.8.2013 2:15:25 com.sun.jersey.api.core.ScanningResourceConfig logClasses 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" INFO: Root resource classes found: 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ShuttlServerRest 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.FlushEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ListThawEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ShuttlConfigurationEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.CopyBucketEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ThawBucketsEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ArchiveBucketEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" class com.splunk.shuttl.server.mbeans.rest.ListBucketsEndpoint 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" 16.8.2013 2:15:25 com.sun.jersey.api.core.ScanningResourceConfig init 08-16-2013 02:15:25.902 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" INFO: No provider classes found. 08-16-2013 02:15:26.015 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" 16.8.2013 2:15:26 com.sun.jersey.server.impl.application.WebApplicationImpl _initiate 08-16-2013 02:15:26.015 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" INFO: Initiating Jersey application, version 'Jersey: 1.11 12/09/2011 11:05 AM' 08-16-2013 02:16:27.600 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" 16.8.2013 2:16:27 com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException 08-16-2013 02:16:27.600 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container 08-16-2013 02:16:27.600 +0200 ERROR ExecProcessor - message from "/mnt/shared_storage/etc/apps/shuttl/bin/start.sh" java.lang.RuntimeException: java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

petterik commented 11 years ago

I had no idea "search head pooling and mounted bundles" was a thing :) I need to ask around how people deal with this. I can probably hit some splunk endpoint to figure out where the app lives. Thanks for finding this! The fix shouldn't be too hard to fix, so expect it in a week or two. Remind me again if I haven't fixed it by then!

milan-koudelka commented 11 years ago

Amazing. If you need some help with that, let me know. Maybe I can help you somehow. This is great app. I'm looking forward to use it.

petterik commented 11 years ago

I'm looking in to this right now, @milan-koudelka.

To support search head pooling:

  1. inputs.conf's scripts need to have the path "script://./bin/script.sh", without the $SPLUNK_HOME/etc/apps/
  2. Figure out the best way to get where my app is installed. If I can't, I'll have to do something about the custom configuration that Shuttl has (for historical reasons). That'll hopefully be it though!
milan-koudelka commented 11 years ago

Hi, I thing that best way how to find the path of apps location is read this file

[root@splunk-sh:~] cat /opt/splunk/etc/system/local/distsearch.conf # http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Mounttheknowledgebundle

[searchhead:splunk-sh] mounted_bundles = true bundles_location = /mnt/shared_storage/etc/

I'm looking in to this right now, @milan-koudelka.

To support search head pooling:

  1. inputs.conf's scripts need to have the path "script://./bin/script.sh", without the $SPLUNK_HOME/etc/apps/
  2. Figure out the best way to get where my app is installed. If I can't, I'll have to do something about the custom configuration that Shuttl has (for historical reasons). That'll hopefully be it though!

Reply to this email directly or view it on GitHub: https://github.com/splunk/splunk-shuttl/issues/131#issuecomment-24628589

milan-koudelka commented 11 years ago

Hi, do you have any update on this ?

Best regards

petterik commented 11 years ago

Yes! I've updated the develop branch with fixes to search head pooling. Would you like to try it out?

Clone the repository and run ./buildit.sh, then you'll have a build in ./build/shuttl.tgz

milan-koudelka commented 11 years ago

Great, I will test it asap on my DEV environment.

bknowles commented 11 years ago

Have you looked into using s3ql (see http://groups.google.com/group/s3ql) for the filesystem? It's the best filesystem-on-s3 implementation that I've ever seen, and comes the closest to making it look like "just another" filesystem that is mounted.

The core of the code is still single-threaded, but I think that's true of a lot of filesystem code that is still in use today.

Just a thought. Thanks!