mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Refactor iterate_meta_bits #1181

Closed wks closed 3 months ago

wks commented 3 months ago

This PR refactors the mechanism for visiting (reading and/or updating) side metadata in bulk, specifically the function SideMetadataSpec::iterate_meta_bits.

The function now uses a single FnMut callback instead of two Fn callbacks. It uses the enum type BitByteRange to distinguish whole byte ranges from bit ranges in a byte. This allows the user to capture variables mutably in the callback. This also removes the Cell used in find_prev_non_zero_value_fast.

The function is made non-recursive to improve the performance. Some test cases are added to test for corner cases.

The function is moved to a dedicated ranges module and renamed to break_bit_range, for several reasons:

wks commented 3 months ago

Here are some benchmark results.

size data bytes meta bytes master master no recursion this PR memset
line 256 4 6.1574 ns 4.1684 ns 4.0134 ns 1.1416 ns
block 32768 512 6.5757 ns 4.5108 ns 4.5009 ns 1.7800 ns

The benchmarks can be found in benches/bulk_meta/bzero_bset.rs, and can be invoked by

taskset -c 0 cargo bench bzero_bset --features bench

Note: You need taskset only if your CPU has heterogeneous cores (so-called "performance-cores" and "efficient-cores").

This PR makes iterate_meta_bits faster than the master branch. That's partly because the existing iterate_meta_bits function is recursive, although it is guaranteed to terminate at the second level of recursion. This PR replaced the recursion with base cases and reduced the cost. I modified the master branch (see https://github.com/mmtk/mmtk-core/commit/24326a22c4441aae5357208c70b3af7dbe24a90e) to make iterate_meta_bits non-recursive using the same algorithm as this PR, and it's almost as fast as this PR (as seen in the "master no recursion" column in the table above). There may be some subtle difference in the generated code after changing two closures into one closure, but the difference between "master no recusrion" and "this PR" is small.

But neither master nor this PR is as fast as calling memset directly to set the meta bits, and the overhead is huge.

I'll try to rebase https://github.com/mmtk/mmtk-core/pull/1174 on top of this PR and mark both PRs for review if the refactored code is pleasant for implementing heap traversal with.