servo / rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack
Apache License 2.0
1.35k stars 145 forks source link

implement `get_size::GetSize` for `SmallVec`, so we can know how much memory we're using? #331

Closed amandasaurus closed 9 months ago

amandasaurus commented 9 months ago

I have been using smallvec successfully, and it's been a great memory saver. I also use the get_size library to report how much (heap) memory my data structures use. This doesn't work for SmallVec's, or data stucutres which include them.

Is it possible to add support to GetSize to smallvec? Maybe behind a feature flag? get_size allows one to implement it for external types, do you have any tips on how I could implement GetSize for a SmallVec?

mbrubeck commented 9 months ago

Is it possible to add support to GetSize to smallvec? Maybe behind a feature flag?

Sure! The implementation should be pretty straightforward, similar to this template that is used for collections like Vec.

For SmallVec, the get_heap_size method should start with 0 when the buffer is “inline,” or the buffer size (capacity × stack size of each item) when it is “spilled.” Then it should iterate through the vector and add the heap size of each element.

amandasaurus commented 9 months ago

Oh that's for your tips.

Alas in my current code, I have SmallVec as a generic in a struct. I can't figure out how to do small_vec's “custom overriding” in this case.

I don't know a lot about this type of Rust. Is it possible to add an impl for GetSize to small_vec directly, hidden behind a feature flag?

On So, 14 Jan 2024 0:54 +01:00, Matt Brubeck @.***> wrote:

Is it possible to add support to GetSize to smallvec? Maybe behind a feature flag?

Sure! The implementation should be pretty straightforward, similar to this template https://github.com/DKerp/get-size/blob/v0.1.4/src/lib.rs#L165-L179 that is used for collections like Vec.

For SmallVec, the get_heap_size method should start with 0 when the buffer is “inline,” or the buffer size (capacity × stack size of each item) when it is “spilled.” Then it should iterate through the vector and add the heap size of each element.

— Reply to this email directly, view it on GitHub https://github.com/servo/rust-smallvec/issues/331#issuecomment-1890798212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAMCZNMMDQNQ5HLN2SFITYOMNBVAVCNFSM6AAAAABBZHIYQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQG44TQMRRGI. You are receiving this because you authored the thread.Message ID: @.***>

-- Amanda

mbrubeck commented 9 months ago

Is it possible to add an impl for GetSize to small_vec directly, hidden behind a feature flag?

Yes, please feel free to submit a pull request. You may want to do this on the v1 branch if you want to use it right away.

mbrubeck commented 9 months ago

Some more details:

  1. Start by forking this repository and checking out the v1 branch.
  2. Add get-size as a dependency in Cargo.toml with optional = true.
  3. Add an impl<A: Array> get_size::GetSize for SmallVec<A> { ... } behind a feature = "get_size" gate, similar to this one for feature = "serde".

Let me know if you have any questions along the way!

amandasaurus commented 9 months ago

Thanks, I've attemped it in my get-size branch (based on v1).

On So, 14 Jan 2024 0:54 +01:00, Matt Brubeck @.***> wrote:

For SmallVec, the get_heap_size method should start with 0 when the buffer is “inline,” or the buffer size (capacity × stack size of each item) when it is “spilled.” Then it should iterate through the vector and add the heap size of each element.

I've done that, however I'm uncertain. The impl is here (tests here).

    assert_eq!(SmallVec::<[i32; 1]>::get_stack_size(), 24);
    assert_eq!(SmallVec::<[i32; 2]>::get_stack_size(), 24);
    assert_eq!(SmallVec::<[i32; 10]>::get_stack_size(), 56)

This is needed for the test, but I thought empty SmallVecs store the array on the stack. This is just the standard get_stack_size() from the get_size crate. How come changing the size of the array doesn't linearly affect stack size

On So, 14 Jan 2024 0:54 +01:00, Matt Brubeck @.***> wrote:

Then it should iterate through the vector and add the heap size of each element.

    let mut a: SmallVec<[i32; 2]> = smallvec![];
    a.push(0);
    assert!(!a.spilled());
    assert_eq!(a.get_size(), 28);
    assert_eq!(a.get_heap_size(), 4);

Following your advice, I get this unit test But if a SmallVec is inline, then the heap size should be zero, right? 🤔 Have I done something wrong?

Or have I completely misunderstood something about how heap memory works?? (🤣 don't worry it's quite possible, please don't be afraid of offending me)

-- Amanda

mbrubeck commented 9 months ago

How come changing the size of the array doesn't linearly affect stack size

This is expected. SmallVec has a minimum stack size because it must be able store a length, capacity, and pointer, for the “spilled” case where it is basically a (usize, usize, *const T). On a 64-bit platform, this takes up (3 64 bits) = (3 8 bytes) = 24 bytes.

And because of alignment, the size can only grow in multiples of 64 bits (8 bytes).

But if a SmallVec is inline, then the heap size should be zero, right? 🤔 Have I done something wrong?

This code:

        for v in self.iter() {
            total += GetSize::get_size(v);
        }

should call get_heap_size(v) instead of get_size(v).

amandasaurus commented 9 months ago

oh wow! I've made the changes you recommended, and from the tests this is exactly the sort of numbers I expect:

    let mut a: SmallVec<[i32; 2]> = smallvec![];
    assert!(!a.spilled());
    assert_eq!(a.get_size(), 24);
    assert_eq!(a.get_heap_size(), 0);

    a.push(0);
    assert_eq!(a.len(), 1);
    assert!(!a.spilled());
    assert_eq!(a.get_size(), 24);
    assert_eq!(a.get_heap_size(), 0);

    a.push(1);
    assert_eq!(a.len(), 2);
    assert!(!a.spilled());
    assert_eq!(a.get_size(), 24);
    assert_eq!(a.get_heap_size(), 0);

    a.push(2);
    assert_eq!(a.len(), 3);
    assert!(a.spilled());
    assert_eq!(a.get_size(), 40);
    assert_eq!(a.get_heap_size(), 16);

    a.push(3);
    assert_eq!(a.len(), 4);
    assert!(a.spilled());
    assert_eq!(a.get_size(), 40);
    assert_eq!(a.get_heap_size(), 16);

    a.push(4);
    assert_eq!(a.len(), 5);
    assert!(a.spilled());
    assert_eq!(a.get_size(), 56);
    assert_eq!(a.get_heap_size(), 32);

The total size grows a little bit, and the heap size stays 0 while it's not spilled. 🙂

Would you be willing to merge this feature into the main codebase?

mbrubeck commented 9 months ago

Would you be willing to merge this feature into the main codebase?

Yes, absolutely!

mbrubeck commented 9 months ago

Implemented in #335, #336.