Support Calloc by implementing alloc_zeroed on the Global Alloc (ValkeyAlloc) of valkeymodule-rs.
alloc_zeroed is used as an optimization (and also serves specific use cases) used by other libraries in Rust. It is particularly useful when creating large vectors to avoid a large allocation at the time of creation of the object. It improves the performance of these operations because the memory requested is allocated lazily.
Without this change, alloc_zeroed AND alloc would be handled by the SDK using alloc (which uses ValkeyModule_Alloc) because only the alloc and dealloc functions were implemented on ValkeyAlloc.
With this change, any alloc_zeroed will be handled by the SDK using ValkeyModule_Calloc. Any alloc will be handled using ValkeyModule_Alloc.
In our testing (valkey-bloom), this optimization brought down the valkey server side latency (INFO commandstats) for creations of large Bloom filter objects of 1M capacity from ~500 usec to ~20 usec, and for objects of 10M capacity from 5 milliseconds to 40 usec. With this change, similar gains can also be expected for large object creation of other datatypes when alloc_zeroed is used.
Support Calloc by implementing
alloc_zeroed
on the Global Alloc (ValkeyAlloc
) of valkeymodule-rs.alloc_zeroed
is used as an optimization (and also serves specific use cases) used by other libraries in Rust. It is particularly useful when creating large vectors to avoid a large allocation at the time of creation of the object. It improves the performance of these operations because the memory requested is allocated lazily.Without this change,
alloc_zeroed
ANDalloc
would be handled by the SDK usingalloc
(which usesValkeyModule_Alloc
) because only thealloc
anddealloc
functions were implemented onValkeyAlloc
.With this change, any
alloc_zeroed
will be handled by the SDK usingValkeyModule_Calloc
. Anyalloc
will be handled usingValkeyModule_Alloc
.In our testing (valkey-bloom), this optimization brought down the valkey server side latency (
INFO commandstats
) for creations of large Bloom filter objects of 1M capacity from ~500 usec to ~20 usec, and for objects of 10M capacity from 5 milliseconds to 40 usec. With this change, similar gains can also be expected for large object creation of other datatypes whenalloc_zeroed
is used.For a good summary, you can see: https://blogs.fau.de/hager/archives/825