mccolljr / segvec

SegVec data structure for rust. Similar to Vec, but allocates memory in chunks of increasing size.
MIT License
35 stars 5 forks source link

Subslicing and SegVec::from(Slice) #26

Closed cehteh closed 1 year ago

cehteh commented 1 year ago

With this one can create new (Sub-) Slices from existing Slices and turn them into owned SegVecs by SegVec::from(someslice) and SegVec::from(&someslice).

as this was made on top of 'index_improvement' so you can merge only this to include both PR's or I rebase later.

This finalizes my immediate needs. After merging I'll do some clippy/cleanup PR which then may make this suitable for a new release. The other things I noted/planned will be added later.

cehteh commented 1 year ago

I am surprised that SegVec can iterate over its contents faster than std::Vec. I havent investigated the cause further, better inlining perhaps, might be some bug in the benchmark as well. Would be nice if anyone can figure this out.

mccolljr commented 1 year ago

@cehteh Thanks for your work on this. I had one comment, everything else looks quite nice. I'm also surprised that iteration for Vec is slower, but after looking at the benchmark code I don't see anything suspicious so perhaps you're right about inlining, or maybe we've found a way to avoid bounds checks by iterating segments and then flattening?

cehteh commented 1 year ago

On 2023-06-27 03:54, Jacob Ryan McCollum wrote:

@cehteh Thanks for your work on this. I had one comment, everything else looks quite nice. I'm also surprised that iteration for Vec is slower, but after looking at the benchmark code I don't see anything suspicious.

I am (slowly) working on this, after implementing DoubleEndedIterator speeds become mostly equal or slightly slower (because of extra check/branch). At such small things every instruction makes a huge difference.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1609265660 You are receiving this because you were mentioned.

Message ID: @.***>

cehteh commented 1 year ago

On 2023-06-27 03:50, Jacob Ryan McCollum wrote:

@mccolljr commented on this pull request.

@@ -855,6 +856,20 @@ impl<T, C: MemConfig> IntoIterator for SegVec<T, C> {
} }

+/// Creates an new [SegVec][crate::SegVec] from a [Slice][crate::Slice]. +impl<T: Clone, C: MemConfig> From<Slice<'_, T>> for SegVec<T, C> {

  • fn from(slice: Slice<'_, T>) -> Self {
  • slice.iter().cloned().collect()
  • } +}
  • +/// Creates an new [SegVec][crate::SegVec] from a reference to [Slice][crate::Slice]. +impl<T: Clone, C: MemConfig> From<&Slice<'_, T>> for SegVec<T, C> {

  • fn from(slice: &Slice<'_, T>) -> Self {
  • slice.iter().cloned().collect()
  • } +}

Is there any specific reason for implementing From directly for Slice, &Slice? It seems like these both just forward to the

Prelimary answer (details may change): I wanted the 'From' for convenience. Having a CoW like use case here where 'Slices' are the borrowed variants with a lifetime and can be easily converted into a 'owned' object.

let owned = SegVec::from(borrowed);

Is much more straightforward to read than doing the iterator-collect-dance on every occasion.

I was thinking about a segvec::Cow enum that implements this pattern but that might be out of place to be a core part of the segvec crate. You can tell me if you like the idea otherwise I'll likely just do it ad-hoc in my codebase here.

FromIterator implementation via collect, so we could just support FromIterator for iterators over reference types when the target type is Clone, something like:

impl<'a, T: Clone + 'a, C: MemConfig> Extend<&'a T> for SegVec<T, C> {
   fn extend<I: IntoIterator<Item = &'a T>>(&mut self, iter: I) {
       <Self as Extend<T>>::extend(self, iter.into_iter().cloned())
   }
}
impl<'a, T: 'a + Clone, C: MemConfig> FromIterator<&'a T> for
SegVec<T, C> {
   fn from_iter<I: IntoIterator<Item = &'a T>>(iter: I) -> Self {
       let mut v = SegVec::new();
       v.extend(iter);
       v
   }
}

There's already an Extend implementation for &'a T when T: Copy, so we could just change that to be T: Clone, since T: Copy requires T: Clone anyway and the derived Clone implementation for T: Copy should, from what I understand, just be a memcopy.

Might be doable as well.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#pullrequestreview-1500549516 You are receiving this because you authored the thread.

Message ID: @.***>

mccolljr commented 1 year ago

From a library perspective I think it makes sense to use FromIterator for the case of converting from a borrowing slice to a collection. If we enable FromIterator from borrowing iterators over &T where T is Clone, one can just use std::iter::FromIterator and then SegVec::from_iter(some_slice) becomes possible, which is much more ergonomic than some_slice.into_iter().collect::<SegVec<_,_>>(). How do you feel about going that route here?

mccolljr commented 1 year ago

There's another item that would be nice to implement later, which is related but a bit tangential to your copy-on-write suggestion - It is theoretically safe to push / extend / etc. a SegVec (from a single thread) through an immutable reference, since it is a non-destructive operation.

cehteh commented 1 year ago

On 2023-06-29 21:50, Jacob Ryan McCollum wrote:

There's another item that would be nice to implement later, which is related but a bit tangential to your copy-on-write suggestion - It is theoretically safe to push / extend / etc. a SegVec (from a single thread) through an immutable reference, since it is a non-destructive operation.

in CowStr I implemented such with a 'Guard' pattern:

https://docs.rs/cowstr/1.0.0/cowstr/struct.CowStr.html#method.guard

https://docs.rs/cowstr/1.0.0/cowstr/struct.CowStrPushGuard.html

There I have to deal with (forbidding) reallocation. This is not needed in SegVec. Earlier I tried a 'thread' variant which I was not happy with. Such a Guard is way more versatile AND simpler to implement and understand.

Same caveats as in CowStr would happen with SegVec as well. All Shared interior mutable SegVec will see the mutation, access to .len() would need to use atomics. This may yield to surprising results in case one creates multiple iterators which on a segvec which are racey vs the extension on another thread. IMO this is useful but should be well documented and for that reason I tagged the 'try_push*' on the guards as unsafe even when they are technically memory safe.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614122427 You are receiving this because you were mentioned.

Message ID: @.***>

cehteh commented 1 year ago

On 2023-06-29 21:36, Jacob Ryan McCollum wrote:

From a library perspective I think it makes sense to use FromIterator for the case of converting from a borrowing slice to a collection. If we enable FromIterator from borrowing iterators over &T where T is Clone, one can just use std::iter::FromIterator and then SegVec::from_iter(some_slice) becomes possible, which is much more ergonomic than some_slice.into_iter().collect::<SegVec<_,_>>(). How do you feel about going that route here?

Yep fine. I'll change that (later/next days).

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614107293 You are receiving this because you were mentioned.

Message ID: @.***>

cehteh commented 1 year ago

I only removed the From impls here. PR #27 will get implement the FromIterator as discussed when rebased on top of this. Makes no sense to fix that here now when the next step is the reworked Iterators impl.

mccolljr commented 1 year ago

Sounds good, will merge. Thank you

cehteh commented 1 year ago

On 2023-06-29 21:36, Jacob Ryan McCollum wrote:

From a library perspective I think it makes sense to use FromIterator for the case of converting from a borrowing slice to a collection. If we enable FromIterator from borrowing iterators over &T where T is Clone, one can just use std::iter::FromIterator and then SegVec::from_iter(some_slice) becomes possible, which is much more ergonomic than some_slice.into_iter().collect::<SegVec<_,_>>(). How do you feel about going that route here?

I tried that now, but I am not very pleased. Having a FromIterator<&T> makes the FromIterator implementations become ambiguous, one has to resolve this with a turbofish. Think about one wants to have a SegVec<&T> as well. Plus handling the case from &Slice to SegVec adds even more complexity. IMO this going to be not very ergonomic. I am tempted to add the From and From<&Slice> back, those just work unambigously. What do you think?

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614107293 You are receiving this because you were mentioned.

Message ID: @.***>

mccolljr commented 1 year ago

I took a look at what From implementations are available for Vec<T> in the standard library, as I generally don't use these methods. I found: https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#3070-3153

Vec<T> implements From<&[T]>, From<&mut [T]> when T: Clone, and some other Cow and Box variants. Notably, it also does not implement FromIterator for iterators over values of type &T, even if T: Clone. From that perspective, I can see the justification for From<Slice> and From<&Slice> for SegVec. However, the Vec<T> From implementations seem to have optimized constructors when the values are of these types, which will not be the case for SegVec's From<Slice> and From<&Slice> implementations. We are simply falling back to FromIterator.

My worry is that by providing a From<Slice> and From<&Slice> implementation, there is an implicit communication that these operations are somehow optimized in a way that they are not for &[T] values (or other iterators over values of type &T), which is not the case. I personally do not find myself creating Vec (or SegVec, etc) values from unmodified slices very often, if at all. It is much more common for me to do something like blah.into_iter().map(...).collect().

I sympathize with the lack of ergonomics around doing let my_vec_var = SegVec::<MyType>::from_iter(my_slice_var), but I'm still not sure that these From implementations provide value at a library level - these seem like something that is better served by a function within the target codebase to wrap the ugly SegVec::<T>::from_iter(some_slice) call.

If you can find a pathway for Slice to be used to perform an optimized construction of a SegVec value beyond just calling the FromIterator implementation, I support this, otherwise I think maybe it is best to follow the lead that Vec provides here.

What do you think?

cehteh commented 1 year ago

On 2023-07-05 10:55, Jacob Ryan McCollum wrote:

I took a look at what From implementations are available for Vec<T> in the standard library, as I generally don't use these methods. I found: https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#3070-3153

Vec<T> implements From<&[T]>, From<&mut [T]> when T: Clone, and some other Cow and Box variants. Notably, it also does not implement FromIterator for iterators over values of type &T, even if T: Clone. From that perspective, I can see the justification for From<Slice> and From<&Slice> for SegVec. However, the Vec<T> From implementations seem to have optimized constructors when the values are of these types, which will not be the case for SegVec's From<Slice> and From<&Slice> implementations. We are simply falling back to FromIterator.

My worry is that by providing a From<Slice> and From<&Slice> implementation, there is an implicit communication that these operations are somehow optimized in a way that they are not for &[T] values (or other iterators over values of type &T), which is not the case. I personally do not find myself creating Vec (or SegVec, etc) values from unmodified slices very often, if at all. It is much more common for me to do something like blah.into_iter().map(...).collect().

I sympathize with the lack of ergonomics around doing let my_vec_var = SegVec::<MyType>::from_iter(my_slice_var), but I'm still not sure that these From implementations provide value at a library level - these seem like something that is better served by a function within the target codebase to wrap the ugly SegVec::<T>::from_iter(some_slice) call.

Ok then I'll do it that way.

If you can find a pathway for Slice to be used to perform an optimized construction of a SegVec value beyond just calling the FromIterator implementation, I support this, otherwise I think maybe it is best to follow the lead that Vec provides here.

What do you think?

I am pretty sure iterators already optimize (close) to the best possible way. Since T has to be Clone there isn't much that can be improved. If we had specialization and could implement it for Copy types things might be different (but prolly not by much, as this optimizes well already too).

I wont want to work on that now, either way such things should come after the Storage abstraction becomes implemented as this may change some details how to interact with it.

Note alternative to the Storage abstraction: Can we make smallvec default/unconditional, remove the thin-vec stuff completely, maybe experiment with Box<[ManuallyDrop]> for the segments? That'll simplify the codebase somewhat.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1622222130 You are receiving this because you were mentioned.

Message ID: @.***>