tokio-rs / slab

Slab allocator for Rust
MIT License
709 stars 84 forks source link

Feature Request: key_watermark() API #113

Open rib opened 2 years ago

rib commented 2 years ago

I'm currently using Slabs as part of a Mesh data structure for vertices, edges, loops + faces (based on the design of Blender's BMesh API) and as part of that I'm looking to have secondary vectors of data that are indexed with keys from the slab.

I'm looking to store ancillary attributes (these may be per-vertex, or per-edge etc) within Vec vectors where I'd really like to avoid any memory overhead per value (meshes can be pretty huge), and the corresponding Slab (e.g. vertex slab for a per-vertex attribute) would be the authority on what keys are valid for indexing into those vectors.

To make this work I'd like to be able to maintain the attribute vector lengths according to the length of the Slab entries vector as a simple way of ensuring that all possible slab keys will be valid for indexing in to these attribute vectors.

To not be confused with querying the length of the slab as the number of current values, 'key_watermark' seems like it could get across what it is that I'd like to query here; like this...

    /// Return the current length of entry storage
    ///
    /// This is always larger than all currently valid keys and less than or
    /// equal to the current capacity.
    ///
    /// This may be useful for maintaining secondary data within vectors that
    /// use the same keys as a primary Slab.
    ///
    /// # Examples
    ///
    /// ```
    /// # use slab::*;
    /// let mut slab = Slab::with_capacity(10);
    ///
    /// for i in 0..4 {
    ///     slab.insert(i);
    /// }
    ///
    /// slab.remove(0);
    /// slab.remove(3);
    /// assert_eq!(slab.key_watermark(), 4);
    /// ```
    pub fn key_watermark(&self) -> usize {
        self.entries.len()
    }

I can look at creating a pull request if something like this sounds reasonable?

Ralith commented 2 years ago

Side tables are definitely an important use case. Is capacity insufficient for this?

Darksonn commented 2 years ago

Note that by using the compact method, it is possible to have values with keys greater than that length.

rib commented 2 years ago

Side tables are definitely an important use case. Is capacity insufficient for this?

I haven't double checked the memory growth strategy for Vec but I'd expect some variation of exponential growth which could lead to really large overestimates of the required storage using capacity.

Although capacity can be used for this, any overestimate of the required storage can then cause a significant number of secondary values to need to be initialised.

The entities length may also be an overestimate but to a much lesser extent on average, and since the value is just as readily available (except the current lack of api) then it seems like the ideal choice here.

Considering meshes with 100s of thousands of vertices, edges and faces then this level of redundant work wouldn't be ideal.

rib commented 2 years ago

Note that by using the compact method, it is possible to have values with keys greater than that length.

Although I'd expect the possibility that the length may be an over estimate (which is ok, here and the hope is to just minimize any over estimate that will lead to the redundant initialisation of secondary values), surely even after (or during) a compact then no (valid) key can be greater than this length? Keys index into the entities vector so keys greater than (or equal to) this length would implicitly lead to an out of bounds panic no?

From looking at the implementation of compact it looks like it iteratively pops key/values and swaps them into a hole until the slab length matches the entities length, and I guess out of bound keys should only remain if the user doesn't uphold their side of the remapping?

Darksonn commented 2 years ago

Ah, sorry, you are right, that doesn't happen. I had confused it with the DelayQueue.

LegionMammal978 commented 8 months ago

I've been looking for this feature for my own project. The context is that I have an unordered collection of elements that I want to name by their index, and I occasionally want to remove some of the elements without having to re-index any of the others. However, element removals are occur within batches of processing, where I scan over each element in the collection, mutate it a bit, and possibly remove it and transfer some of its data into another element.

To continue scanning over the collection while removing some elements, I would ordinarily use Slab::retain(), but that is not usable in this case, since whenever I remove an element, I need to access another element. So the remaining solution is to iterate over every key and do the removals and accesses within the loop, as Slab::retain() does internally. But iterating up to the full capacity() would add a lot of pointless work, trying to access elements that can never exist. So exposing entries.len() would make this loop more viable. (My current workaround is to just store the greatest returned key plus 1 next to the Slab.)