vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
127 stars 57 forks source link

chore(vrl): explore arena allocation for VRL programs #82

Open tobz opened 2 years ago

tobz commented 2 years ago

As a thought experiment: could we use an arena allocator for VRL programs to authoritatively constrain their runtime memory usage as well as speed up allocations?

Currently, there are some suboptimal allocation patterns that we'd like to remove from VRL execution in general, but when dealing with non-scalar data types (maps, lists, etc), and including externally-imported code (such as codec parsers), we inevitably perform heap allocations.

This introduces unwelcome allocator pressure, and potentially hard-to-defragment allocations, which typically don't need to live past the execution of the program, as we frequently clone the value on the way out of VRL.

It could potentially be advantageous to use an arena allocator for all heap allocations in a VRL program. Arena allocators are typically used for scenarios where there is "phase-oriented" allocations, or allocation that happens as part of a logical operation, and then can be totally cleaned up once that operation is over. VRL program execution, as it is stateless between program runs, fits this model well.

Separate from the potential speedups of phase-oriented allocation/deallocation, it could also be useful to use a dedicated allocator as doing so would allow constraining the memory usage in a very precise manner, as most arena allocators are fixed-sized, or can operate in fixed-size modes. This would finally allow us to concretely represent how much memory a VRL program could consume, which would be the sum of the thread stack size and the arena allocator size.

blt commented 2 years ago

Some support for this notion, here. We spend a decent chunk of offcpu time in http_pipelines_blackhole just cloning BTrees, then destroying them.