Closed wks closed 3 months ago
Here are some benchmark results.
size | data bytes | meta bytes | master | master no recursion | this PR | memset |
---|---|---|---|---|---|---|
line | 256 | 4 | 6.1574 ns | 4.1684 ns | 4.0134 ns | 1.1416 ns |
block | 32768 | 512 | 6.5757 ns | 4.5108 ns | 4.5009 ns | 1.7800 ns |
The benchmarks can be found in benches/bulk_meta/bzero_bset.rs
, and can be invoked by
taskset -c 0 cargo bench bzero_bset --features bench
Note: You need taskset
only if your CPU has heterogeneous cores (so-called "performance-cores" and "efficient-cores").
This PR makes iterate_meta_bits
faster than the master branch. That's partly because the existing iterate_meta_bits
function is recursive, although it is guaranteed to terminate at the second level of recursion. This PR replaced the recursion with base cases and reduced the cost. I modified the master branch (see https://github.com/mmtk/mmtk-core/commit/24326a22c4441aae5357208c70b3af7dbe24a90e) to make iterate_meta_bits
non-recursive using the same algorithm as this PR, and it's almost as fast as this PR (as seen in the "master no recursion" column in the table above). There may be some subtle difference in the generated code after changing two closures into one closure, but the difference between "master no recusrion" and "this PR" is small.
But neither master nor this PR is as fast as calling memset
directly to set the meta bits, and the overhead is huge.
I'll try to rebase https://github.com/mmtk/mmtk-core/pull/1174 on top of this PR and mark both PRs for review if the refactored code is pleasant for implementing heap traversal with.
This PR refactors the mechanism for visiting (reading and/or updating) side metadata in bulk, specifically the function
SideMetadataSpec::iterate_meta_bits
.The function now uses a single
FnMut
callback instead of twoFn
callbacks. It uses the enum typeBitByteRange
to distinguish whole byte ranges from bit ranges in a byte. This allows the user to capture variables mutably in the callback. This also removes theCell
used infind_prev_non_zero_value_fast
.The function is made non-recursive to improve the performance. Some test cases are added to test for corner cases.
The function is moved to a dedicated
ranges
module and renamed tobreak_bit_range
, for several reasons:SideMetadataSpec
, but it does not access any member ofSideMetadataSpec
.ranges
module in the future.bulk_update_metadata
.