nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Add RBT container to stdlib #331

Closed juancarlospaco closed 3 years ago

juancarlospaco commented 3 years ago

Add Red-Black Tree Container to stdlib.

konsumlamm commented 3 years ago

Note that there already is fusion/btreetables, which fulfils a very similar purpose.

juancarlospaco commented 3 years ago

Wikipedia explains why it is not exactly the same. But you can use it as a base to hack into, yeah...

Araq commented 3 years ago

I fail to see the advantage of RBTrees over BTrees after the invention of CPU caches.

timotheecour commented 3 years ago

links:

c-blake commented 3 years ago

Though the common textbook presentation of B-Trees have flat array node structure this presents scalability problems with > 8..48 elements per node (depending upon CPU specifics related to shifting said flat arrays for insert/delete and problem specifics like how big keys/obs are). It is possible to replace the flat arrays in B-tree nodes with a BST node structure (or nested B-tree or skip lists or etc.) to just manage the small allocation arenas that is each node.

When doing this, it is usually desirable to have those small (probably 1..2 byte) pointer "node numbers" all point into one cacheable area like 128..4096 bytes. This is an uncommonly advanced implementation that rarely gets mentioned and even more rarely compared in benchmarks/links like TC's above. (I am unaware of any public comparison, actually.) Some of these topics arose here, btw.

Also, it is not monotonically superior. In the same way that binary search (or linear search) is often faster than a BST for lookup, this tactic tends to make lookups a bit slower but insert and delete much faster. So, as is almost always the case, how interesting an optimization it is depends upon one's operation mix. The relevance to this RFC is that one application of BSTs is inside B-tree nodes themselves. If you are going to have only one, a flat B-Tree is best, though.

Another time when BST's can be useful is when you maintain one collection in multiple orders. Because typical BST pointers have broad scope (e.g. virtual memory pointers, unlike the above 1..2 byte in-B-tree-node pointer case), one can do things like find an object with BST links via one ordering and then get the full node pointer. If the BST has parent links that node pointer can be used to delete/manipulate the found node in other ways (a doubly linked list, another BST, etc.) without further reference to the "key". Because they basically double as allocation arenas with objects having unstable addresses, B-Trees cannot provide this.

Multiple ordering scenarios are typically rare. LRU/other order caches are one. One tends to need externally chained hash tables for that with hashing as your associative lookup and Nim has gotten by without that for a long time - even in the LRUCache package! Anyway, for this application an external package makes the most sense. If people find the need to rely on the external package a lot then it make sense to move to the stdlib, IMO (unless the stdlib's B-tree nodes use it internally anyway).

Although with a zillion eyeballs on it it may surprise you, I consider that C++ std::map impl almost a strawman BST to compare against. In memory alone I clocked it at like 80 bytes per minimum node or something similarly ridiculous. That may be optimizable with C++ allocator/other tricks, but I doubt that benchmark does any of that. Anyway, it should be uncontentious that BSTs are slower than B-trees at large scales when it matters most. Even cache oblivious B-trees (basically BST's with a special node layout strategy) are slower.

All this said, I am happy to add/maintain some BST stuff over at adix if @juancarlospaco or someone want to ask for BSTs there. Doing that for B-tree node structure so people can have like 256..65536 element nodes is one of my next TODOs. If you are careful you can actually implement about 5 styles of BST (redblack, AVL, splay, weight, unbalanced) in like 250 lines of C code non-recursively (as in no funcall overhead). I've done that in C (but never seen anyone else do it - in any prog.lang). I just need to port it to Nim. I was not planning on doing a van Emde Boas layout for in-B-tree-node cache obliviousness, but am open to it eventually. This could be helpful for 65536 entry B-tree nodes for spinning rust/Winchester disk applications where the node itself blows out the L1 cache by a lot.