Open clagese opened 10 years ago
I found another problem with storage in S3. If I save frozen bucket in CSV format in S3 with Shuttl , stored files are not really in csv format and I cannot read them. I can only restore frozen bucket through shuttl and read data with splunk. I don't understand if this is a normal behavior.
I documented this problem in the splunk issue: http://answers.splunk.com/answers/131297/shuttl-development-stopped
I tested the develop branch of shuttl with the hope that this version fix these problems, but it not work.
I get this errors in shuttl.log:
2014-05-14 18:10:30,721 INFO com.splunk.shuttl.archiver.archive.BucketShuttlerRunner: will="Archiving bucket" bucket="LocalBucket [getDirectory()=/opt/splunk/shuttl_archiver/data/safe-buckets/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getName()=db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getIndex()=mytestdb, getFormat()=SPLUNK_BUCKET, getPath()=/opt/splunk/shuttl_archiver/data/safe-buckets/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getEarliest()=Wed Apr 23 16:55:24 CEST 2014, getLatest()=Sun Apr 27 02:17:01 CEST 2014, getSize()=132526]"
2014-05-14 18:10:30,821 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fdevelop_branch%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82%2FCSV' - Unexpected response code 404, expected 200
2014-05-14 18:10:30,822 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fdevelop_branch%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82%2FCSV' - Received error response with XML message
2014-05-14 18:10:30,905 INFO com.splunk.shuttl.archiver.archive.ArchiveBucketTransferer: will="attempting to transfer bucket to archive" bucket="LocalBucket [getDirectory()=/opt/splunk/shuttl_archiver/data/format-export-dir/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/SPLUNK_BUCKET/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getName()=db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getIndex()=mytestdb, getFormat()=CSV, getPath()=/opt/splunk/shuttl_archiver/data/format-export-dir/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/SPLUNK_BUCKET/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getEarliest()=Wed Apr 23 16:55:24 CEST 2014, getLatest()=Sun Apr 27 02:17:01 CEST 2014, getSize()=132526]" destination="/develop_branch/archive_data/my_cluster/splunkIndex02/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/CSV"
2014-05-14 18:10:30,905 INFO com.splunk.shuttl.archiver.filesystem.transaction.TransactionExecuter: will="Prepare transaction" transaction="Transaction [data=LocalBucket [getDirectory()=/opt/splunk/shuttl_archiver/data/format-export-dir/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/SPLUNK_BUCKET/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getName()=db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getIndex()=mytestdb, getFormat()=CSV, getPath()=/opt/splunk/shuttl_archiver/data/format-export-dir/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/SPLUNK_BUCKET/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82, getEarliest()=Wed Apr 23 16:55:24 CEST 2014, getLatest()=Sun Apr 27 02:17:01 CEST 2014, getSize()=132526], remoteTemp=/develop_branch/temporary_data/splunkIndex02/develop_branch/archive_data/my_cluster/splunkIndex02/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/CSV, dst=/develop_branch/archive_data/my_cluster/splunkIndex02/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/CSV]"
2014-05-14 18:10:30,995 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fdevelop_branch%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82%2FCSV' - Unexpected response code 404, expected 200
2014-05-14 18:10:30,995 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fdevelop_branch%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82%2FCSV' - Received error response with XML message
2014-05-14 18:10:31,086 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2F' - Unexpected response code 403, expected 200
2014-05-14 18:10:31,086 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2F' - Received error response with XML message
2014-05-14 18:10:31,087 ERROR com.splunk.shuttl.archiver.filesystem.transaction.AbstractTransaction: did="Tried making directories up to: /develop_branch/temporary_data/splunkIndex02/develop_branch/archive_data/my_cluster/splunkIndex02/mytestdb/db_1398557821_1398264924_27_B95021DE-89AB-4A9D-B924-575736C54B82/CSV" happened="org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?>InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
InvalidObjectState
I configured my cluster to archive frozen splunk data on s3 with shuttl. I installed Shuttl on all nodes. My data are really archived on s3 bucket but there is a problem with archiviation path. Any path values I set as "archivePath" in archiver.xml(/myrootpath or myrootpath or or / ), Shuttl store my data in a root path "//" within my s3 bucket. The result is like a directory without name, in the root of s3 bucket. For example if I set /myrootpath in archiver.xml, I find on s3 splunk buckets like "s3://my_s3_bucket//myrootpath/archive_data/my_cluster/splunkIndex02/mytestdb/db_1397659802_1397655448_19_B95021DE-89AB-4A9D-B924-575736C54B81"
In the shuttl log I found these warning: 2014-04-16 18:56:30,700 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fmyrootpath%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1397659802_1397655448_19_B95021DE-89AB-4A9D-B924-575736C54B81%2FCSV' - Unexpected response code 404, expected 200 2014-04-16 18:56:30,701 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/%2Fmyrootpath%2Farchive_data%2Fmy_cluster%2FsplunkIndex02%2Fmytestdb%2Fdb_1397659802_1397655448_19_B95021DE-89AB-4A9D-B924-575736C54B81%2FCSV' - Received error response with XML message
In spite of this problem the Thaw and flush process of splunk buckets from search head interface works well.
I red issues: http://answers.splunk.com/answers/85635/shuttl-archiving-errors https://github.com/splunk/splunk-shuttl/issues/131 where you say that this behavior is due to an Hadoop's old version of the S3 library.
Are there new shuttl release that fix this problem or are there some planned soon?