splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

Single Node Test on S3 #99

Closed Klevmarken closed 11 years ago

Klevmarken commented 11 years ago

Will do a single node test on S3 with bucket sizes of 100MB, 1GB, and 10GB including both shuttling and thawing.

Klevmarken commented 11 years ago

Did a single node test with bucket size set to 1GB. Resulted in the following exception:

NFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark

The only thread I have found that discusses related exceptions is the following:

http://stackoverflow.com/questions/4698869/problems-when-uploading-large-files-to-amazon-s3

Klevmarken commented 11 years ago

Did a single node test with bucket size set to 1GB on s3n to see if there was any difference. Resulted in:

Error Message: (Error) (Code) EntityTooLarge (/Code) (Message) Your proposed upload exceeds the maximum allowed size (/Message) (ProposedSize) 8295576317 (/ProposedSize) (RequestId) 43D2F861EE780AEE (/RequestId) (HostId) PXtHRzRzHjN1qAElRV8AiMi9rTmAMZOh+B8zvDiDJa4/z+u0Jm4MUdhldAEZAKOG (/HostId) (MaxSize)Allowed 5368709120 (/MaxSizeAllowed) (/Error)

Will do additional tests with smaller entity size.

Klevmarken commented 11 years ago

Did a single node test with bucket size set to 100MB on s3 and s3n.

2012-11-05 14:45:11,794 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Connection reset 2012-11-05 14:45:11,797 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request 2012-11-05 14:45:11,951 INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark

Buckets are still transfered to s3 and they are thawable as well. However when I started thawing the largest .csv file in the collection (around 7GB) the thawing process seemed to die after only thawing 136MB. By spamming on the 'Thaw Buckets' button in the web interface the thawing process started with the other .csv files and once they finished the initial .csv file thawing process was resumed. It also takes around 20 sec to thaw 100MB worth of .csv to thaw-transfers-dir/.

This might have something to do with the following warning:

WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable