mccolljr / segvec

SegVec data structure for rust. Similar to Vec, but allocates memory in chunks of increasing size.
MIT License
35 stars 5 forks source link

Question about future directions/some thoughts #31

Open cehteh opened 1 year ago

cehteh commented 1 year ago

I've now improved the bits I need. In future you'll see less contributions from me. I will still look and improve things eventually (esp running the tests under miri see #29). Here I list some thoughts you may comment:

mccolljr commented 1 year ago
* [ ]  ThinVec proved bad, could/should be removed.

Agreed.

* [ ]  SmallVec (to my own surprise) won't improve anything either, I guess the branch cost is higher than the benefit of not double dereferencing/locality. Even with bigger FACTOR, increasing the inlined table size, cachline align it, the differences are small/inconsistent, although results here are a bit more mixed than thin-vec. Might be removed as well.

I like the concept of having the ability to inline the segment table on the stack, but I think having it as a cargo feature is the wrong implementation. Ultimately, I want to create traits for SegmentTableStorage and SegmentStorage (maybe unsafe traits, since they will have to support some xxx_unchecked methods), which can then be implemented within this crate for Vec and which users can provide their own implementations of if they need different behavior.

* [ ]  Drain could be rewritten returning the drained elements while _not_ calling 'remove()'. Shifting elements and adjusting the len only when done. `remove()` may be a 1-element Drain instead.

Yes, drain is a pain point right now. I agree that there is much room for improvement.

* [ ]  While Drain gets rewritten the segments could be stored as `Vec<Box<[MaybeUninit<T>]>>`, SegVec then needs its own Drop (using the Drain machinery) for freeing elements. I wont expect  much/any performance benefit from this though. This would be just the 'best we can do' storage scheme without going into lowlevel memory allocation bits while also addressing the miri issues/remove() and shifting elements around.

Box<[MaybeUninit<T>]> is a good alternative for Vec for a default implementation of the SegmentStorage trait, but it requires this crate to maintain more unsafe code within itself. I think starting with Vec<Vec<Element>> is the simplest way to capitalize on already-audited unsafe code from the standard library. I do want to solve the miri issues though.

Ultimately, I think these are all good suggestions and when I have some time to spend on this project I want to go through and do a lot of cleanup. I think that all the work you've done is excellent and I greatly appreciate the contributions you've made.

I think I want to add the SegmentTableStorage and SegmentStorage traits as a final addition before cutting a new release. In the process, I can probably improve the Drain implementation. Unfortunately, I don't think I will have time to do this for another couple of weeks, but I intend to do it as soon as possible.

cehteh commented 1 year ago

On 2023-07-15 15:42, Jacob Ryan McCollum wrote:

* [ ]  ThinVec proved bad, could/should be removed.  

Agreed.

I can do that, but will take some time. Being busy here with other things until mid of August.

* [ ]  SmallVec (to my own surprise) won't improve anything

either, I guess the branch cost is higher than the benefit of not double dereferencing/locality. Even with bigger FACTOR, increasing the inlined table size, cachline align it, the differences are small/inconsistent, although results here are a bit more mixed than thin-vec. Might be removed as well.

I like the concept of having the ability to inline the segment table on the stack, but I think having it as a cargo feature is the wrong

The idea is nice indeed, yet there is no benefit, but its disadvantages are slim as well. This may be slightly improve when using Box<[MaybeUninit]> segments as its more memory efficient.

It would be interesting to know if there is a SmallVec like crate that puts the first N elements unconditionally on the stack and then spills additional elements into a dynamic allocation. I haven't looked yet. Prolly this would have no/marginal gains as SmallVec itself.

implementation. Ultimately, I want to create traits for SegmentTableStorage and SegmentStorage (maybe unsafe traits, since they will have to support some xxx_unchecked methods), which can then be implemented within this crate for Vec and which users can provide their own implementations of if they need different behavior.

* [ ]  Drain could be rewritten returning the drained elements

while not calling 'remove()'. Shifting elements and adjusting the len only when done. remove() may be a 1-element Drain instead.

Yes, drain is a pain point right now. I agree that there is much room for improvement.

As for Drain, it should be improved, but it is no block on a next release as the current implementation works. Before addressing Drain the storage and miri stuff needs to be fixed anyway.

* [ ]  While Drain gets rewritten the segments could be stored

as Vec<Box<[MaybeUninit<T>]>>, SegVec then needs its own Drop (using the Drain machinery) for freeing elements. I wont expect much/any performance benefit from this though. This would be just the 'best we can do' storage scheme without going into lowlevel memory allocation bits while also addressing the miri issues/remove() and shifting elements around.

Box<[MaybeUninit<T>]> is a good alternative for Vec for a default implementation of the SegmentStorage trait, but it requires this crate to maintain more unsafe code within itself. I think starting with Vec<Vec<Element>> is the simplest way to capitalize on already-audited unsafe code from the standard library. I do want to solve the miri issues though.

Miri's stacked borrows makes it sometimes unnecessary hard to work around even when the code is already sound (for certain use cases). It should be fixed esp since I added few more unsafe bits and some more may come it would be much helpful when the testsuite passes cleanly under miri w/o turning the stacked borrows off. That should be addressed as well after the storage implementation become improved.

Ultimately, I think these are all good suggestions and when I have some time to spend on this project I want to go through and do a lot of cleanup. I think that all the work you've done is excellent and I greatly appreciate the contributions you've made.

Thanks

I think I want to add the SegmentTableStorage and SegmentStorage traits as a final addition before cutting a new release. In the process, I can probably improve the Drain implementation. Unfortunately, I don't think I will have time to do this for another couple of weeks, but I intend to do it as soon as possible.

Your Crate, Your decision. I think it could be significantly simpler and easier to maintain by simplifying things. I don't see much use cases for having these traits. If one requires something rather special it might be rather justified to factor that out into a new crate (imagine a SegVec holding the segments in mmaped files or in a database).

What could/should be done is to improve the current implementation into something as good as possible as rust permits. Eventually it may add the (yet unstable) Global allocator things like Box/Vec etc doing already becoming

SegVec<
    T,
    MemConfig,
    Allocator = std::alloc::Global
>

or even

SegVec<
    T,
    MemConfig,
    SegmentAllocator = std::alloc::Global,
    SegmentsAllocator = std::alloc::Global
>

Giving some fancy choices to change the allocators, defaulting to std.

I'd appreciate a new release soon'ish because I want to use the things I recently added in my own crate.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/issues/31#issuecomment-1636905041 You are receiving this because you authored the thread.

Message ID: @.***>

mccolljr commented 1 year ago

I'd appreciate a new release soon'ish because I want to use the things I recently added in my own crate.

@cehteh done, I will do another follow-up release to clean up and add some new features, but v0.2.0 is published, with all recent changes.