pbudzik / bytecask

Key/value database inspired by Bitcask
75 stars 3 forks source link

Patricia Trie for key lookup? #1

Closed acruise closed 11 years ago

acruise commented 12 years ago

Hi there,

A lot of Bytecask looks pretty close to ideal for one of my projects, but the requirement to keep all the keys in memory could be problematic, especially since my keys are massively redundant in their prefixes (filesystem paths). How hard do you think it would be to use a Patricia Trie for the keys?

acruise commented 12 years ago

This looks ideal, but no code has ever been published :(

pbudzik commented 12 years ago

The idea seems to be very interesting. This might be a configurable, non-default index implementation that is space optimized, sacrificing a bit the lookup time. Note that whatever type of map is used w/o persistence (swapping) this is not going to be fully scalable as even a "compressed map" like Radix/Patricia eventually will hit the memory limit. In your case though it should help and significant amount of space would likely be saved.

I think that if we find some Java library with a good license it should not be hard to build it in. The problem is time, so you'd be more than welcome to contribute it back to the project. If you really consider using it I may try to find some time as well :) Thanks for the feedback.

acruise commented 12 years ago

I played around with the most popular Java PATRICIA Trie library and found that it doesn't have the space efficiency property we're after, since it retains references to keys. I filed a ticket to discuss it.

pbudzik commented 12 years ago

That's a pity. It doesn't seem to be hard to implement it for anyone who's into algorithms. I've researched a bit and I must say the choice is not so great.

pbudzik commented 12 years ago

I looked at the response to your ticket. Took a look at the Simple Patricia Trie.... and to me it looks that anyway keys are retained even in the simplest implementation. My understanding of this is that we want it to create more nodes out of it and we want them share common string parts, so the key is dynamic (virtual). I mean some nodes should have no value just a link. Retrieval is gluing all parts (nodes) together until it forms the key and grabbing the value (if found any).

pbudzik commented 12 years ago

Not sure if you are still interested, but I've commited some code to support prefixed keys by a dedicated map. Any feedback is welcome.

acruise commented 12 years ago

Cool, thanks! I had a brief look at the code; it looks like the RadixTrie stuff is not wired into the main Bytecask codebase though righT?

Edit: whoops, just saw 4563fbfa5ef289e. I'll give it a try!

pbudzik commented 12 years ago

Example of usage: https://github.com/pbudzik/bytecask/blob/master/src/test/scala/com/github/bytecask/BasicPrefixedKeysSuite.scala (prefixed = true in the constructor)

Here it looks how the trie works:

https://github.com/pbudzik/bytecask/blob/master/src/test/scala/com/github/bytecask/RadixTreeSuite.scala

Definitely the Trie implementation itself needs improving, but I need your input if it is at all what you are wanting :)