orium / rpds

Rust persistent data structures
Mozilla Public License 2.0
1.24k stars 58 forks source link

Support for other underlying smart pointers for the data structures links #7

Closed brendanzab closed 5 years ago

brendanzab commented 6 years ago

It would be nice to either use Box or Rc as the underlying smart pointer type. Alas, until we have associated generic types, we can't be polymorphic over this, but perhaps in the interim code generation could be used to create concrete implementations of:

Then once associated generic types comes we could convert these to type aliases that refer to a common List type.

orium commented 6 years ago

We can't do Box since we need to be able to have multiple pointers to the underlying data (due to structural sharing).

But this is something I had in mind for a while, because of performance reasons. Why pay the price of having thread-safe reference counting when you do not need it (which is probably most of the time)? And there is a significant price to pay. For example, the HashTrieMap::insert() and HashTrieMap::remove() were more than twice as fast with Rc than with Arc!

So this is something that has a lot of value, however I really dislike the idea of solving this through code generation, specially since there are plans to support higher kinded types that would solve this nice and neatly. When rust supports them I will implement this.

Marwes commented 6 years ago

If you have a stomach for hacks it is sort of possible to emulate the HKT that are needed for this to work :).

use std::rc::Rc;
use std::sync::Arc;
use std::ops::Deref;

trait Shared<T> {
    type Ptr: Clone + Deref<Target = T>;
}

impl<T> Shared<T> for Rc<()> {
    type Ptr = Rc<T>;
}

impl<T> Shared<T> for Arc<()> {
    type Ptr = Arc<T>;
}

struct List<T, S> where S: Shared<T> {
    field: S::Ptr,
}

struct Map<K, V, S> where S: Shared<K> + Shared<V> {
    key: <S as Shared<K>>::Ptr,
    value: <S as Shared<V>>::Ptr,
}

fn main() {
    Map::<i32, &str, Rc<()>> {
        key: Rc::new(1),
        value: Rc::new(""),
    };
}

https://play.rust-lang.org/?gist=8b67f30a72e00dc3c2e3865bb31ae797&version=stable

Marwes commented 6 years ago

Hacked together https://github.com/orium/rpds/issues/7#issuecomment-362635901 at https://github.com/Marwes/rpds/tree/rc . The speedups aren't very impressive so I am not sure if the added complexity is worth it.

running 8 tests
test vector_drop_last        ... bench:   8,240,820 ns/iter (+/- 1,384,333)
test vector_drop_last_mut    ... bench:     424,192 ns/iter (+/- 35,384)
test vector_get              ... bench:      83,473 ns/iter (+/- 6,715)
test vector_iterate          ... bench:      80,264 ns/iter (+/- 14,550)
test vector_push_back        ... bench:   9,929,261 ns/iter (+/- 197,915)
test vector_push_back_mut    ... bench:     963,687 ns/iter (+/- 176,121)
test vector_push_back_mut_rc ... bench:     847,343 ns/iter (+/- 126,324)
test vector_push_back_rc     ... bench:   7,367,774 ns/iter (+/- 582,476)
orium commented 6 years ago

I measure a better speedup:

 name              bench-arc ns/iter  bench-rc ns/iter  diff ns/iter   diff %  speedup 
 vector_drop_last  81,000,464         43,920,265         -37,080,199  -45.78%   x 1.84 
 vector_push_back  84,319,145         47,405,064         -36,914,081  -43.78%   x 1.78 

but even with this significant speedup I still don't think it is worth it, given the "hacky nature" of the solution.

If everything goes well we should have generic associated types in rust by the end of this year (at least in nightly) and we will be able to have an elegant solution.

Marwes commented 6 years ago

I'd be (pleasantly) surprised if generic associated types appear this year though you may be right that this is a bit to hacky.

That said, I fear that generic associated types will barely simplify anything. The main problem with this implementation is that deriving does not work when storing the associated type of a type parameter (deriving(Debug) gives us ERROR: P::Ptr does not implement Debug). However that will still be a problem with an implementation using generic associated types since that also entails using P::Ptr for the fields (https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md#associated-type-constructors-of-type-arguments).

The only actual benefit of generic associated types is that we don't have to specify bounds explicitly for every type that is behind a pointer https://github.com/Marwes/rpds/blob/e482d5abbaa6c876d7c624e497affe7299bbeece/src/sequence/vector/mod.rs#L153.

brendanzab commented 6 years ago

Would top-level higher kinded types solve this?

Marwes commented 6 years ago

Would top-level higher kinded types solve this?

Maybe? This mostly depends on how the bounds in deriving gets specified. Something like below would be necessary.

#[derive(Debug)]
struct List<T, P<_>> {
}
// Generates
impl<T, P> Debug for List<T, P<_>> where for<U> P<U>: Debug { }

However that may actually be to broad of a bound in some case since we only need "for<U> where P<U> is a field of List" P<U>: Debug (basically the two bounds that are explicitly specified now).

https://github.com/rust-lang/rfcs/pull/2353 might help with this issue though.

Marwes commented 6 years ago

I guess for<U> P<U>: Debug ought to be fine in this case though and it is impossible to generate "perfect" bounds anyway. (Something to do with self-recursive types).

Marwes commented 6 years ago

Semi-related, it would be nice to omit the reference counted pointer for the values themselves and just store values as-is. For small types such as plain integers or floats as well as types which are already reference counted the Arc just incurs additional overhead.

Marwes commented 6 years ago

Actually, if the keys and values are stored unboxed I think a less hacky solution may be used. I looked through a few of the structures (not all), and it seems that only a single type is boxed in an Arc per type. So it would be enough to just parameterize on that type directly, avoiding the whole mess of associated types and manually implementing derive's.

struct HashTrieMap<K, V, Entry> {
    ...
    entry: Entry,
}

pub trait Shared: Deref {
     fn new(t: Self::Target) -> Self;
     fn make_mut(self_: &mut Self) -> &mut Self::Target;
}

impl<K, V, Entry> HashTrieMap<K, V, Entry> where Entry: Shared<Target = Entry<K, V>> { .. }

pub type ArcHashTrieMap<K, V> = HashTrieMap<K, V, Arc<Entry<K, V>>>;
pub type RcHashTrieMap<K, V> = HashTrieMap<K, V, Rc<Entry<K, V>>>;
orium commented 6 years ago

Splited this in two issues, one for to control the container of the elements of the data structure, which can be Rc, Arc, and owned (using clone).

This one is now just for the links inside between the data structures nodes.

orium commented 5 years ago

I've implemented this for List: https://github.com/orium/rpds/blob/458ca73250de360621d9e9287feac97fa883c40f/src/list/mod.rs#L113

This uses Archery, a project I created recently to be able to abstract from Rc/Arc easily and with good ergonomics. Take a look if you are interested.

orium commented 5 years ago

Work in progress:

Marwes commented 5 years ago

Nice! I was wondering when this would show up after seeing Archery :)

Marwes commented 5 years ago

I may end up stealing the ideas in Archery and try to do something similar for RefCell/RwLock/Mutex

orium commented 5 years ago

I may end up stealing the ideas in Archery and try to do something similar for RefCell/RwLock/Mutex

Awesome. Consider adding them to archery if they are general/flexible enough. In my mind I want archery to be more than just abstraction between Rc/Arc. For instance I want to add some sort of container that can be either owning T, or having a SharedPointer<T, Rc/Arc>.

orium commented 5 years ago

Release 0.7.0 where you can choose between Rc and Arc pointer. Note that #36 still stands.