hamstergem / hamster

Efficient, Immutable, Thread-Safe Collection classes for Ruby
Other
1.89k stars 94 forks source link

Lean HAMT #220

Open dsisnero opened 8 years ago

dsisnero commented 8 years ago

A new paper on HAMTs by Michael J. Steindorfer and Jurgen J. Vinju shows how to make HAMT implementations “lean”, meaning they consume less memory, and “efficient”, meaning speedier performance. The paper claims to make iteration and equality checks about 80 – 100% faster.

See http://bendyworks.com/leveling-clojures-hash-maps/ which links the paper.

Design the HAMT using this paper.

alexdowad commented 8 years ago

Thanks for the link!

I just looked at the page, and the "compact" representation which they show is actually very similar to what JVM Clojure uses for nodes which are less than half full. The idea of separate buckets for key/value pairs and subnodes is similar to what Hamster uses. I need to read the paper for details, though.

alexdowad commented 8 years ago

Hmm. One issue with using a packed representation for nodes, is that Ruby doesn't have a fast, built-in popcount (count number of 1 bits in a number) operator. JVM Clojure uses popcount to convert logical indices to physical indices in a packed node, and this new CHAMP representation would need the same.

I have created a gem called bit-twiddle which provides a fast popcount operation for Ruby; we could use it, but it would need to also have a pure-Ruby version for JRuby/RBX.

alexdowad commented 8 years ago

The pseudocode for deletion in Steindorfer and Vinju's paper is faulty; still, the idea is clear. And it is a good idea.

alexdowad commented 8 years ago

An alternative to using bit-twiddle would be for me to code up a C extension for Hamster, which replaces Trie. I was planning to do this for the immutable-ruby gem, but it doesn't seem to have much traction. Perhaps people like "Hamster" better?

dsisnero commented 8 years ago

I didn't know about immutable-ruby. What is the difference between that gem and hamster?

On Wed, Jan 13, 2016 at 5:49 AM, Alex Dowad notifications@github.com wrote:

An alternative to using bit-twiddle would be for me to code up a C extension for Hamster, which replaces Trie. I was planning to do this for the immutable-ruby gem, but it doesn't seem to have much traction. Perhaps people like "Hamster" better?

— Reply to this email directly or view it on GitHub https://github.com/hamstergem/hamster/issues/220#issuecomment-171281221.

alexdowad commented 8 years ago

What is the difference between that gem and hamster?

It's a fork of Hamster which uses a different namespace. Structures are Immutable::Hash, Immutable::Set, and so on. Otherwise everything is the same.

no-reply commented 8 years ago

Just saw this and wanted to put a word in for pure Ruby options.

The Ruby RDF core is moving to Hamster for its in-memory data storage in its 2.0 release (see the release notes), and is strongly invested in remaining pure Ruby.

Obviously, if the advantages promised in the paper materialize, we are :+1: to an optional C extension and/or bit-twiddle with a Ruby-only option.

dreammaker commented 7 years ago

@alexdowad, with regard to naming, if you want to move, please put a big note at the top of the readme. Otherwise, people don't know what the difference is. I tried to figure it out on my own, and the only clue was that Hamster had more recent commits, so I figured this was the one to use.

alexdowad commented 7 years ago

@dreammaker, any discussion of immutable-ruby can be done from its GH page. This location is for discussion of hamster.