splunk / splunk-shuttl

Splunk app for archive management, including HDFS support.
Apache License 2.0
36 stars 19 forks source link

Splunk's bucket count is not reliable enough #75

Open petterik opened 12 years ago

petterik commented 12 years ago

If all buckets from an index have been Shuttl'd, then the bucket count is going to start from 0 again. When this happens, every Shuttl'd bucket that has the same bucket index (i.e 0), won't be thawable as long as the bucket with bucket index 0 is still in the index.

Same problem goes for buckets that has been shuttled with the same bucket index. They cannot be thawed at the same time.

Solution: When thawing, make sure that there is no bucket in the index with the same bucket index. Figure out how to best achieve this. Right now I'm thinking hashing the index+earliest+latest+random to get a fairly unique number. As long as the hash collides with other thawed buckets, increment hash with 1. The probability that the index would create a 7-digit number of buckets is highly unlikely, so a thaw bucket index with 7-9 digits, is probably reasonable reliable.

Investigate: How big can the bucket index be and what does the number affect? How does Splunk use it internally? Does it matter?

*Note: The "bucket index" is the last digit on a bucket name. I.e bucket db_123456789_123456789_13 would have bucket index 13.