splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

Implement bucket size correctly #13

Closed petterik closed 12 years ago

petterik commented 12 years ago

Bucket size is currently implemented incorrectly.

The reason to have size of a bucket, is to know how much space the bucket will have on the local disk when the bucket is thawed. We're currently checking how big the bucket is archived. This size is with high probability wrong, because of compressions and the real archived format. We should instead check the size of the bucket before it is archived and store persist that metadata somewhere. The ArchiveFileSystem would then never have to care about the implementation of how big a file is, and the size would always be correct.

I imagine a simple implementation where we persist the size of the bucket in the bucket's name. We can then read and strip the file size of the bucket from the name when it's thawed, and this code can probably be in one class and place.