Closed foodchaining closed 4 years ago
Hi @foodchaining!
Yep, Clojure and Scala both used to use HAMT for some of their data structures. At least Scala now uses CHAMP structures by default instead, which improves upon HAMT especially in respect to memory footprint and locality.
IMap
, IVector
, ISet
and some other structures in dartz
currently use AVL trees, which are very similar to red-black trees. AVL trees support decently fast and memory efficient lookup, insertion and removal operations. They are also relatively easy (at least for me) to implement and maintain correctly compared to HAMT/CHAMP.
Both HAMT/CHAMP as implemented in clojure/scala and the AVL trees in dartz
are persistent data structures, in the sense that derived instances will share structure/memory with their respective original instances. When inserting or removing a leaf node in an AVL Tree, only the log(n) tree nodes on the path between that leaf and the root will be re-allocated, plus possibly a handful of neighbouring nodes if rebalancing is required.
A good and efficient CHAMP implementation in Dart would likely improve the efficiency of certain operations (union, intersection, etc) in some dartz
structures and might also prove better overall. I remember the vacuum persistent package you mentioned as being quite sluggish for basic operations though, so it might prove tricky to get good performance.
Thanks for the clarification,
The remaining detail is transients (as defined by Clojure). As far as I can see dartz
doesn't have them; would they be useful here if implemented for persistent AVL trees?
Hi again! No, there is nothing resembling transients in dartz
at the moment and I can't say I've missed them much personally. Do you have use cases where you believe transients or scala style Buffers/Builders would be significantly more convenient and/or performant?
The use case is a series of inserts, updates and removals where intermediate states are not needed. I don't know whether transients if added to dartz
could significantly improve performance. There is however an AVL tree based collection library for Clojure (data.avl). It could be used to estimate the performance difference between persistent and transient variants of batch mutations. I may come up with a test at some point.
Sounds great! I would be very interested to know if transients or similar tricks would enable performance improvements significant enough to justify the associated impurities. If you get somewhere in your investigations before I do, please let me know!
Closing this for now – feel free to reopen!
Hi, They say the so-called persistent hash array mapped trie is used to efficiently implement immutable maps (e.g. in Clojure).
How does this compare to dartz' maps?
See also: https://github.com/vacuumlabs/persistent-1 https://github.com/google/built_collection.dart/issues/44