rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.95k stars 1.57k forks source link

Add Persistent Collections to libcollections #825

Open steveklabnik opened 9 years ago

steveklabnik commented 9 years ago

Issue by Gankro Friday Aug 15, 2014 at 15:09 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/16521

This issue was labelled with: A-libs in the Rust repository


Many people want (fully) persistent collections in Rust to make it more pure-functional friendly. Rust doesn't have GC, Laziness, or Memoization (and I don't think it should), so we can't do any of the really fancy stuff described in the oft-cited "purely functional data structures". Instead, we should go practical and simple by copying several of the structures described in Scala's Concrete Immutable Collections.

Of particular interest is:

Ideally, we could do something magic by genercizing over Box and Rc to reuse a lot of code between our persistent and ephemeral tree-based collections, but I'm doubtful it will be that easy.

antonkulaga commented 9 years ago

Actually such collections can make rust more user-friendly for scala developers like me.

Storyyeller commented 7 years ago

Is this still open? I've implemented a couple persistent data structures for my own use (it's pretty simple using Rc), but it's surprising to me that noone's officially published a library like this yet. I guess the biggest issue is abstracting over Rc/Arc.

orium commented 6 years ago

I'm currently implementing a rust library for persistent data structures (actually exactly the ones mentinioned (red-black tree is currently in the works)). It is still pretty green, but usable: orium/rpds.

Centril commented 6 years ago

As this is a request / discussion about inclusion into the standard library I'm removing A-community-library.

AbigailBuccaneer commented 5 years ago

immer is a C++ library which implements the Vector and HashTrie cases that may be instructive here. In particular, the library contains a couple of techniques for avoiding paying the performance cost of persistence until it's actually needed.

The first is transients, which is a concept borrowed from Clojure, and is a temporary non-persistent mutable view of a persistent vector. Because of this temporary lack of persistence we can modify the otherwise copy-on-write storage in place as long as our reference to it is unique. This would naturally translate to Rust, and has the advantage that it can expose the same API as a Vec, which generic could would surely appreciate. orium/rpds allows every persistent data structure to be modified in place when persistence isn't needed, which feels less principled and less usable in generic code but potentially more convenient for common cases.

The other technique is leveraging C++'s move semantics to achieve the same thing: If we're pushing to an rvalue we treat it as a transient, doing the same optimization of modifying in place where possible. This allows the code to keep a completely immutable API (other than allowing assignment) while still having the same performance of a mutable one:

immer::vector<int> iota(int begin, int end) {
    immer::vector<int> v;
    for (int i = begin; i != end; ++i) {
        v = std::move(v).push_back(42); // no copy-on-write will ever happen here
    }
    return v;
}

This would also translate naturally to Rust, and is the moral equivalent of fn push(self, value: T) -> Vector<T>. As well as immutability and performance, this API has the advantage that persistence requires an explicit clone(), which should make it more obvious to readers of code where in the code structural sharing is happening.