Closed cehteh closed 1 year ago
I am surprised that SegVec can iterate over its contents faster than std::Vec. I havent investigated the cause further, better inlining perhaps, might be some bug in the benchmark as well. Would be nice if anyone can figure this out.
@cehteh Thanks for your work on this. I had one comment, everything else looks quite nice. I'm also surprised that iteration for Vec
is slower, but after looking at the benchmark code I don't see anything suspicious so perhaps you're right about inlining, or maybe we've found a way to avoid bounds checks by iterating segments and then flattening?
On 2023-06-27 03:54, Jacob Ryan McCollum wrote:
@cehteh Thanks for your work on this. I had one comment, everything else looks quite nice. I'm also surprised that iteration for
Vec
is slower, but after looking at the benchmark code I don't see anything suspicious.
I am (slowly) working on this, after implementing DoubleEndedIterator speeds become mostly equal or slightly slower (because of extra check/branch). At such small things every instruction makes a huge difference.
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1609265660 You are receiving this because you were mentioned.
Message ID: @.***>
On 2023-06-27 03:50, Jacob Ryan McCollum wrote:
@mccolljr commented on this pull request.
@@ -855,6 +856,20 @@ impl<T, C: MemConfig> IntoIterator for SegVec<T, C> {
} }+/// Creates an new [
SegVec
][crate::SegVec] from a [Slice
][crate::Slice]. +impl<T: Clone, C: MemConfig> From<Slice<'_, T>> for SegVec<T, C> {
- fn from(slice: Slice<'_, T>) -> Self {
- slice.iter().cloned().collect()
- } +}
+/// Creates an new [
SegVec
][crate::SegVec] from a reference to [Slice
][crate::Slice]. +impl<T: Clone, C: MemConfig> From<&Slice<'_, T>> for SegVec<T, C> {- fn from(slice: &Slice<'_, T>) -> Self {
- slice.iter().cloned().collect()
- } +}
Is there any specific reason for implementing
From
directly forSlice
,&Slice
? It seems like these both just forward to the
Prelimary answer (details may change):
I wanted the 'From
let owned = SegVec::from(borrowed);
Is much more straightforward to read than doing the iterator-collect-dance on every occasion.
I was thinking about a segvec::Cow enum that implements this pattern but that might be out of place to be a core part of the segvec crate. You can tell me if you like the idea otherwise I'll likely just do it ad-hoc in my codebase here.
FromIterator
implementation viacollect
, so we could just supportFromIterator
for iterators over reference types when the target type isClone
, something like:impl<'a, T: Clone + 'a, C: MemConfig> Extend<&'a T> for SegVec<T, C> { fn extend<I: IntoIterator<Item = &'a T>>(&mut self, iter: I) { <Self as Extend<T>>::extend(self, iter.into_iter().cloned()) } } impl<'a, T: 'a + Clone, C: MemConfig> FromIterator<&'a T> for SegVec<T, C> { fn from_iter<I: IntoIterator<Item = &'a T>>(iter: I) -> Self { let mut v = SegVec::new(); v.extend(iter); v } }
There's already an
Extend
implementation for&'a T
whenT: Copy
, so we could just change that to beT: Clone
, sinceT: Copy
requiresT: Clone
anyway and the derivedClone
implementation forT: Copy
should, from what I understand, just be a memcopy.
Might be doable as well.
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#pullrequestreview-1500549516 You are receiving this because you authored the thread.
Message ID: @.***>
From a library perspective I think it makes sense to use FromIterator
for the case of converting from a borrowing slice to a collection. If we enable FromIterator
from borrowing iterators over &T
where T
is Clone
, one can just use std::iter::FromIterator
and then SegVec::from_iter(some_slice)
becomes possible, which is much more ergonomic than some_slice.into_iter().collect::<SegVec<_,_>>()
. How do you feel about going that route here?
There's another item that would be nice to implement later, which is related but a bit tangential to your copy-on-write suggestion - It is theoretically safe to push
/ extend
/ etc. a SegVec
(from a single thread) through an immutable reference, since it is a non-destructive operation.
On 2023-06-29 21:50, Jacob Ryan McCollum wrote:
There's another item that would be nice to implement later, which is related but a bit tangential to your copy-on-write suggestion - It is theoretically safe to
push
/extend
/ etc. aSegVec
(from a single thread) through an immutable reference, since it is a non-destructive operation.
in CowStr I implemented such with a 'Guard' pattern:
https://docs.rs/cowstr/1.0.0/cowstr/struct.CowStr.html#method.guard
https://docs.rs/cowstr/1.0.0/cowstr/struct.CowStrPushGuard.html
There I have to deal with (forbidding) reallocation. This is not needed in SegVec. Earlier I tried a 'thread' variant which I was not happy with. Such a Guard is way more versatile AND simpler to implement and understand.
Same caveats as in CowStr would happen with SegVec as well. All Shared interior mutable SegVec will see the mutation, access to .len() would need to use atomics. This may yield to surprising results in case one creates multiple iterators which on a segvec which are racey vs the extension on another thread. IMO this is useful but should be well documented and for that reason I tagged the 'try_push*' on the guards as unsafe even when they are technically memory safe.
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614122427 You are receiving this because you were mentioned.
Message ID: @.***>
On 2023-06-29 21:36, Jacob Ryan McCollum wrote:
From a library perspective I think it makes sense to use
FromIterator
for the case of converting from a borrowing slice to a collection. If we enableFromIterator
from borrowing iterators over&T
whereT
isClone
, one can justuse std::iter::FromIterator
and thenSegVec::from_iter(some_slice)
becomes possible, which is much more ergonomic thansome_slice.into_iter().collect::<SegVec<_,_>>()
. How do you feel about going that route here?
Yep fine. I'll change that (later/next days).
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614107293 You are receiving this because you were mentioned.
Message ID: @.***>
I only removed the From impls here. PR #27 will get implement the FromIterator as discussed when rebased on top of this. Makes no sense to fix that here now when the next step is the reworked Iterators impl.
Sounds good, will merge. Thank you
On 2023-06-29 21:36, Jacob Ryan McCollum wrote:
From a library perspective I think it makes sense to use
FromIterator
for the case of converting from a borrowing slice to a collection. If we enableFromIterator
from borrowing iterators over&T
whereT
isClone
, one can justuse std::iter::FromIterator
and thenSegVec::from_iter(some_slice)
becomes possible, which is much more ergonomic thansome_slice.into_iter().collect::<SegVec<_,_>>()
. How do you feel about going that route here?
I tried that now, but I am not very pleased. Having a FromIterator<&T>
makes the FromIterator implementations become ambiguous, one has to
resolve this with a turbofish. Think about one wants to have a
SegVec<&T> as well. Plus handling the case from &Slice to SegVec adds
even more complexity. IMO this going to be not very ergonomic. I am
tempted to add the From
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1614107293 You are receiving this because you were mentioned.
Message ID: @.***>
I took a look at what From
implementations are available for Vec<T>
in the standard library, as I generally don't use these methods. I found: https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#3070-3153
Vec<T>
implements From<&[T]>
, From<&mut [T]>
when T: Clone
, and some other Cow
and Box
variants. Notably, it also does not implement FromIterator
for iterators over values of type &T
, even if T: Clone
. From that perspective, I can see the justification for From<Slice>
and From<&Slice>
for SegVec
. However, the Vec<T>
From
implementations seem to have optimized constructors when the values are of these types, which will not be the case for SegVec
's From<Slice>
and From<&Slice>
implementations. We are simply falling back to FromIterator
.
My worry is that by providing a From<Slice>
and From<&Slice>
implementation, there is an implicit communication that these operations are somehow optimized in a way that they are not for &[T]
values (or other iterators over values of type &T
), which is not the case. I personally do not find myself creating Vec
(or SegVec
, etc) values from unmodified slices very often, if at all. It is much more common for me to do something like blah.into_iter().map(...).collect()
.
I sympathize with the lack of ergonomics around doing let my_vec_var = SegVec::<MyType>::from_iter(my_slice_var)
, but I'm still not sure that these From
implementations provide value at a library level - these seem like something that is better served by a function within the target codebase to wrap the ugly SegVec::<T>::from_iter(some_slice)
call.
If you can find a pathway for Slice
to be used to perform an optimized construction of a SegVec
value beyond just calling the FromIterator
implementation, I support this, otherwise I think maybe it is best to follow the lead that Vec
provides here.
What do you think?
On 2023-07-05 10:55, Jacob Ryan McCollum wrote:
I took a look at what
From
implementations are available forVec<T>
in the standard library, as I generally don't use these methods. I found: https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#3070-3153
Vec<T>
implementsFrom<&[T]>
,From<&mut [T]>
whenT: Clone
, and some otherCow
andBox
variants. Notably, it also does not implementFromIterator
for iterators over values of type&T
, even ifT: Clone
. From that perspective, I can see the justification forFrom<Slice>
andFrom<&Slice>
forSegVec
. However, theVec<T>
From
implementations seem to have optimized constructors when the values are of these types, which will not be the case forSegVec
'sFrom<Slice>
andFrom<&Slice>
implementations. We are simply falling back toFromIterator
.My worry is that by providing a
From<Slice>
andFrom<&Slice>
implementation, there is an implicit communication that these operations are somehow optimized in a way that they are not for&[T]
values (or other iterators over values of type&T
), which is not the case. I personally do not find myself creatingVec
(orSegVec
, etc) values from unmodified slices very often, if at all. It is much more common for me to do something likeblah.into_iter().map(...).collect()
.I sympathize with the lack of ergonomics around doing
let my_vec_var = SegVec::<MyType>::from_iter(my_slice_var)
, but I'm still not sure that theseFrom
implementations provide value at a library level - these seem like something that is better served by a function within the target codebase to wrap the uglySegVec::<T>::from_iter(some_slice)
call.
Ok then I'll do it that way.
If you can find a pathway for
Slice
to be used to perform an optimized construction of aSegVec
value beyond just calling theFromIterator
implementation, I support this, otherwise I think maybe it is best to follow the lead thatVec
provides here.What do you think?
I am pretty sure iterators already optimize (close) to the best possible way. Since T has to be Clone there isn't much that can be improved. If we had specialization and could implement it for Copy types things might be different (but prolly not by much, as this optimizes well already too).
I wont want to work on that now, either way such things should come after the Storage abstraction becomes implemented as this may change some details how to interact with it.
Note alternative to the Storage abstraction: Can we make smallvec
default/unconditional, remove the thin-vec stuff completely, maybe
experiment with Box<[ManuallyDrop
-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/26#issuecomment-1622222130 You are receiving this because you were mentioned.
Message ID: @.***>
With this one can create new (Sub-) Slices from existing Slices and turn them into owned SegVecs by SegVec::from(someslice) and SegVec::from(&someslice).
as this was made on top of 'index_improvement' so you can merge only this to include both PR's or I rebase later.
This finalizes my immediate needs. After merging I'll do some clippy/cleanup PR which then may make this suitable for a new release. The other things I noted/planned will be added later.