spebbe / dartz

Functional programming in Dart
MIT License
756 stars 59 forks source link

Map mutation efficiency with regard to minimization of copying. #55

Closed foodchaining closed 4 years ago

foodchaining commented 4 years ago

Hi, They say the so-called persistent hash array mapped trie is used to efficiently implement immutable maps (e.g. in Clojure).

How does this compare to dartz' maps?

See also: https://github.com/vacuumlabs/persistent-1 https://github.com/google/built_collection.dart/issues/44

spebbe commented 4 years ago

Hi @foodchaining!

Yep, Clojure and Scala both used to use HAMT for some of their data structures. At least Scala now uses CHAMP structures by default instead, which improves upon HAMT especially in respect to memory footprint and locality.

IMap, IVector, ISet and some other structures in dartz currently use AVL trees, which are very similar to red-black trees. AVL trees support decently fast and memory efficient lookup, insertion and removal operations. They are also relatively easy (at least for me) to implement and maintain correctly compared to HAMT/CHAMP.

Both HAMT/CHAMP as implemented in clojure/scala and the AVL trees in dartz are persistent data structures, in the sense that derived instances will share structure/memory with their respective original instances. When inserting or removing a leaf node in an AVL Tree, only the log(n) tree nodes on the path between that leaf and the root will be re-allocated, plus possibly a handful of neighbouring nodes if rebalancing is required.

A good and efficient CHAMP implementation in Dart would likely improve the efficiency of certain operations (union, intersection, etc) in some dartz structures and might also prove better overall. I remember the vacuum persistent package you mentioned as being quite sluggish for basic operations though, so it might prove tricky to get good performance.

foodchaining commented 4 years ago

Thanks for the clarification, The remaining detail is transients (as defined by Clojure). As far as I can see dartz doesn't have them; would they be useful here if implemented for persistent AVL trees?

spebbe commented 4 years ago

Hi again! No, there is nothing resembling transients in dartz at the moment and I can't say I've missed them much personally. Do you have use cases where you believe transients or scala style Buffers/Builders would be significantly more convenient and/or performant?

foodchaining commented 4 years ago

The use case is a series of inserts, updates and removals where intermediate states are not needed. I don't know whether transients if added to dartz could significantly improve performance. There is however an AVL tree based collection library for Clojure (data.avl). It could be used to estimate the performance difference between persistent and transient variants of batch mutations. I may come up with a test at some point.

spebbe commented 4 years ago

Sounds great! I would be very interested to know if transients or similar tricks would enable performance improvements significant enough to justify the associated impurities. If you get somewhere in your investigations before I do, please let me know!

spebbe commented 4 years ago

Closing this for now – feel free to reopen!