More implicit data structures

Morwenn commented 5 years ago

While looking for resources about the post-order heap I describe in #1, I found papers describing other implicit data structures. Interestingly enough, the references of the different papers hints to a literature more extensive than I would have thought, mostly centered around priority queues. Here are a few links I found:

Fenwick trees and variants
Cache-aligned 4-ary AM-heap, apparently building on AM-heap, apparently building on M-heap
Some implicit priority queues: this paper is especially interesting because it has links to a wide range of other implicit data structures

Now it's worth noting that I didn't take the time to read them all, and while all of those are called "implicit" I wouldn't be surprised if some were actually succint data structures. If you ever find time to read those you will probably find quite a number of implicit data structures to complete your repository.

pfalcon commented 5 years ago

Thanks for the links! I actually found many more implicit data structures even during my initial dive into the area ~1.5 years ago. However, it quickly became clear that in general, implicit data structures can be roughly (and likely very subjectively) categorized as:

Basic "magic" implicit data structures, which are both sufficiently easy to understand and have easy enough algorithms.
Variants of the basic structures above, trying to squeeze out some more features (functional of algorithmic).
"Artisan" or "artificial" implicit data structures, where there's no implicit (sorry for tautology) magic, but instead it's all engineered. And of course, lack of magic comes at a cost.

For example, I'm aware that there's an implicit B-tree. But the basic idea is that pointers to various nodes are actually encoded in the structure, albeit in implicit manner. Whereas baseline-axiomatic efficient search structure is a sorted list, we can encode a bit of info in a pair of consecutive elements by swapping them (or not). But going to practical side, for 32-bit machine, that requires 64 words to encode a single pointer (to entire address space). And for a binary tree node, that would be 128 words, or 0.5K. For me, that's sharp difference with things like heap or beap, which can take as few words as you throw at them, and use them up till the last one, no matter how it all aligns. If you can over-allocate your data structures in chunks of 0.5K, you probably can throw in a couple of explicit pointers to the mix. And that's even without talking of how painfully slow (large constant factor in O(f(n)) ) it would be to decode a single pointer from array of 64 words on conventional hardware.

So, these novel implicit data structures appear to be much less interesting for "mere people who do grocery shopping", and useful only for proverbial "big data". Like, if you paid a few millions for a petabyte array, then it may itch you that even few percent overhead is monetarily tens of thousands or more, and especially given that such memories are usually coupled with parallel processing facilities, algos like above may become interesting.

So, bottom line is that I don't to rush to catalog any implicit structure, but err on the side of the old-natural-magic ones ;-).

(And the above description is based on very superficial understanding of those novel implicit data structures. They're complex and multi-level, so maybe behind the basic 32bits-in-64words idea, there's a lot of deep magic to awe about; I hope to grow to it some day ;-) ).

Morwenn commented 5 years ago

I understand what you're saying there, that's fair :)

pfalcon / awesome-implicit-data-structures

More implicit data structures #2