Closed yancyribbens closed 9 months ago
While there is a lot of potential refactors, I think this is ready to move out of draft state.
I found that using primitive types instead of types like Amount
in the hot path leads to a 2x perf improvement which is crazy. The benchmark on this branch https://github.com/p2pderivatives/rust-bitcoin-coin-selection/tree/experimental/use-primitive-types runs at ~311,000 ns/iter as opposed to 830,000 ns/iter. While this version does not yet add a check for high vs low fee like the C++ implementation, even after adding that in, I think this version is much faster than any implementation.
That could be because of overflow protection in Amount
or messed up benchmark. For instance I don't see black_box
being used in the benchmark so it could be why the results are weird.
I did remove a checked_sum()
in the hotpath (and one that wasn't) so maybe that's in part why. Also, is there really no checked_sum()
on Iter? Could just manually check each one while summing to see how that stacks up. Amount just wraps a u64
so I really didn't expect accessing the Amount
type to be that much worse. Will look into black_box
. It would be interesting to take a look at the ASM produced also to really understand the performance more - I'm not sure the benchmark is necessarily "weird"..
edit: come to think of it, using checked_sum()
in the hotpath shouldn't be needed anyway, since each iteration, the sum is guaranteed to be less than or equal to the first calculation of available_value, so there should not be a reason to check for overflow.
For instance I don't see black_box being used in the benchmark so it could be why the results are weird.
This article was pretty helpful in understanding blackbox: https://gendignoux.com/blog/2022/01/31/rust-benchmarks.html. I think the benchmarks are correct however currently without blackbox because here: https://github.com/p2pderivatives/rust-bitcoin-coin-selection/pull/28/files#diff-9098d62be93e83524a8371395c973d761a95000d1c295f600a8c808e917c16d9R577 I use the return value which forces the compiler to not optimize away the call to bnb()
. If the compiler was optimizing away the call to bnb()
it would not be measurements of 300,000+ ns per iter and vary based on changes to bnb
. The reason for blackbox is to make sure that in the benchmark test itself, the compiler does not optimize away the call to the code you want to benchmark by realizing that since the results aren't being used then don't make the call at all.
regarding bench-marking, it looks like criteron would be a better choice.
Replaced the checked_sum()
with sum()
for the available_value
calculation in the hotpath since it's not possible to overflow. This brings the time down to 772 ns/iter.
test branch_and_bound::benches::bench_select_coins_bnb ... bench: 772,163 ns/iter (+/- 393)
Re-opening this as a PR using my remote so people don't get so many notifications on CI failure.
New PR open here: https://github.com/p2pderivatives/rust-bitcoin-coin-selection/pull/30
Implement Branch and Bound coin selection.