Ideal data block size from allocator

mzabaluev commented 6 years ago

When programming dynamically allocated data structures, it's often desirable to pick the allocation block size that provides best performance and/or causes least memory overhead or fragmentation waste per unit of data. This, however, may depend on the allocator and even on the data layout. I'm proposing to add a method or methods to the Alloc trait that would return an estimation of the "best" allocation block size for the given Layout. The fuzzy value function for "best" may mean "fastest" or "most space-efficient", which two might conceivably be different. The upper bound for the resulting data block size should be the size of the OS memory page (unless a larger size is significantly more efficient with the allocator), and the default implementation could divide the page size to the array item size.

pub trait Alloc {
    // ...
    fn fastest_array_size(&self, layout: &Layout) -> usize;
    fn densest_array_size(&self, layout: &Layout) -> usize;
}

I can write up an RFC if the idea does not merit disapproval.

sfackler commented 6 years ago

Are there similar APIs in other languages? What would the implementations of these methods look like for alloc_system and alloc_jemalloc?

mzabaluev commented 6 years ago

Are there similar APIs in other languages?

None that I know of. Layout-specific allocation as designed for Rust is a novel feature to me.

What would the implementations of these methods look like for alloc_system and alloc_jemalloc?

For alloc_system, tuning for sweet spots would require some experimentation on most platforms, if the particular allocator implementation can be reliably assumed at all. This is probably stable on MacOS, may depend on the libc choice on Windows, and is likely, but not necessarily, glibc malloc on Linux. Absent detailed knowledge, I think the memory page size would serve as a workable guess, unless it's known to have performance disadvantages on the target os/arch/whatever.

For alloc_jemalloc, I assume the implementation is fully controlled by the Rust developers and tuning can be done using knowledge of the internals.

Where this could be most useful, though, is custom allocators.

hanna-kruppe commented 6 years ago

I can see how the allocation size that's best for minimizing fragmentation depending on how the allocator is organized internally. But by "fastest" I assume you mean "code using the allocation is faster" and I don't see how this can depend on the allocator rather than unrelated system parameters. Can you go into more detail on how an allocator might implement it differently from default allocation?

mzabaluev commented 6 years ago

I don't mean a different allocation strategy, just the recommended block size for a layout that e.g. causes the least amount of reshuffling of allocator data structures (amortized per the amount of data) on repeated allocations of this size. So for an allocator that takes memory pages from the OS, that might be as close as possible to the whole page, minus possible in-page overhead of the allocator.

A different case is an application that allocates a large amount of homogeneous data blocks, and so it wants the "least fragmentation" size even though it might be not ideal for allocation performance as such.

hanna-kruppe commented 6 years ago

Oh, so you're talking about allocation performance, not runtime performance?

mzabaluev commented 6 years ago

Oh, so you're talking about allocation performance, not runtime performance?

Allocation performance may or may not figure heavily in the overall performance, depending on the usage pattern. Hence the perceived need for two methods: fastest_array_size for applications that allocate and deallocate often, and densest_array_size for applications that allocate a lot.

rust-lang / rfcs

Ideal data block size from allocator #2223