yahoo / HaloDB

A fast, log structured key-value store.
https://yahoodevelopers.tumblr.com/post/178250134648/introducing-halodb-a-fast-embedded-key-value
Apache License 2.0
508 stars 100 forks source link

Using Yahoo's Oak instead of Snazy/OHC? #19

Closed ahasani closed 5 years ago

ahasani commented 5 years ago

Hi @amannaly

I came across the Yahoo's Oak, looks like if OHC is replaced with this, we could even have a range scan over metadata which would be great, without sacrificing off-heap. Any plan on doing this?

Cheers

amannaly commented 5 years ago

Hi @ahasani

Oak is an interesting project and I have plans to check it out, but not any time soon. I think it will be a major effort as we probably have to make lot of modifications to Oak to use it with HaloDB.

We don't directly use OHC in HaloDB, but have made lot modifications to OHC. OHC is written to be an LRU cache and for our workloads it was taking too much memory.

Our production boxes handles off-heap data of around 100GB. Therefore, lot of changes had to be made to reduce the index's memory footprint. We also had problems with both internal and external fragmentation, due to which memory footprint was high and growing quickly requiring us to restart the service often. To fix this problem we also implemented an index using a memory pool.

In fact, making changes to OHC and implementing the memory pool were amongst the most difficult and time consuming parts of writing HaloDB. Therefore, integrating Oak might also be a big task.

ahasani commented 5 years ago

Thanks @amannaly, Any plan to externalize the in memory index, so to make it as a library usable on its own? or to integrate to Oak/OHC? because your implementation is effective.