In performing test for PIBD work, it's become evident that header validation is not as efficient as it could be. There are a few reasons for this (with more to be addressed in coming PRs), but much of it comes down to the number of calls to the difficulty iterator in store::DifficultyIter, which deserialises several entire headers from the DB each time it's called. This iterator next method is called 60 times on each header validation to get the block's expected difficulty, which uncovered a large performance issue around the inefficient 'BitVec' struct.
A test (ignored in CI, meant to be run manually against live chain data) is included that copies and validates 100k headers from one chain to another. Before this change, this test in release mode (Mac, 2.3 Ghz 8-Core Intel i9) took about 76 seconds. With this change, the test takes 35 seconds. Flamegraphs are attached demonstrating the before and after (note the huge amount of time spent in DifficultyIter::next())
Before:
After:
Note there's still more to do here performance wise, but was an obvious first candidate for optimisation. Also investigated the use of a bitpacking lib https://github.com/quickwit-inc/bitpacking that actually uses SSE and AVX2 instructions to optimise further if available, but it unfortunately only supports packing up to u32.
Exact Changes:
Remove BitVec
Add pack_bits function, which packs an array of u64s to a specified bit length more efficiently than the previous bitvec implementation
Add Proof::pack_nonces function, which packs an array of u64s to a specified bit length more efficiently
Add Proof::pack_len helper function and use instead of previous bitvec
In performing test for PIBD work, it's become evident that header validation is not as efficient as it could be. There are a few reasons for this (with more to be addressed in coming PRs), but much of it comes down to the number of calls to the difficulty iterator in
store::DifficultyIter
, which deserialises several entire headers from the DB each time it's called. This iterator next method is called 60 times on each header validation to get the block's expected difficulty, which uncovered a large performance issue around the inefficient 'BitVec' struct.A test (ignored in CI, meant to be run manually against live chain data) is included that copies and validates 100k headers from one chain to another. Before this change, this test in release mode (Mac, 2.3 Ghz 8-Core Intel i9) took about 76 seconds. With this change, the test takes 35 seconds. Flamegraphs are attached demonstrating the before and after (note the huge amount of time spent in
DifficultyIter::next()
)Before:
After:
Note there's still more to do here performance wise, but was an obvious first candidate for optimisation. Also investigated the use of a bitpacking lib https://github.com/quickwit-inc/bitpacking that actually uses SSE and AVX2 instructions to optimise further if available, but it unfortunately only supports packing up to u32.
Exact Changes:
pack_bits
function, which packs an array of u64s to a specified bit length more efficiently than the previous bitvec implementationProof::pack_nonces
function, which packs an array of u64s to a specified bit length more efficientlyProof::pack_len
helper function and use instead of previous bitvec