I initially wanted to use a lookup table to hardcode the mantissa factor and rely on field arithmetics to not have to make a separate code for divison, unfortunately because the result needs to be truncated, this trick doesn't work. Instead I created a big match to hardcode the operation to apply to the mantissa, this allowed me to remove the shl function. shr was already optimized but we still get a 7x performance gain on some tests.
I also took this as an opportunity to optimize the u256 multiplication (when I could skip the divrem and hardcode the low and high of u256). I justified all the math simplifications in comments.
Performance improvements:
Previous implementation:
test consensus::validation::difficulty::tests::test_bits_to_target_02008000 ... ok (gas usage est.: 51610)
test consensus::validation::difficulty::tests::test_bits_to_target_05009234 ... ok (gas usage est.: 220960)
test consensus::validation::difficulty::tests::test_bits_to_target_181bc330 ... ok (gas usage est.: 322570)
test consensus::validation::difficulty::tests::test_bits_to_target_04123456 ... ok (gas usage est.: 187090)
test consensus::validation::difficulty::tests::test_bits_to_target_01123456 ... ok (gas usage est.: 51610)
test consensus::validation::difficulty::tests::test_bits_to_target_1d00ffff ... ok (gas usage est.: 322570)
test consensus::validation::difficulty::tests::test_bits_to_target_1c0d3142 ... ok (gas usage est.: 322570)
test consensus::validation::difficulty::tests::test_bits_to_target_01003456 ... ok (gas usage est.: 51610)
test consensus::validation::difficulty::tests::test_bits_to_target_1707a429 ... ok (gas usage est.: 322570)
test consensus::validation::difficulty::tests::test_bits_to_target_bounds ... ok (gas usage est.: 680290)
test result: ok. 10 passed; 0 failed; 0 ignored; 89 filtered out;
New implementation:
test consensus::validation::difficulty::tests::test_bits_to_target_1707a429 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_181bc330 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_05009234 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_01003456 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_04123456 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_1c0d3142 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_02008000 ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_1d00ffff ... ok (gas usage est.: 42520)
test consensus::validation::difficulty::tests::test_bits_to_target_bounds ... ok (gas usage est.: 382060)
test consensus::validation::difficulty::tests::test_bits_to_target_01123456 ... ok (gas usage est.: 42520)
test result: ok. 10 passed; 0 failed; 0 ignored; 89 filtered out;
Changes implemented:
I initially wanted to use a lookup table to hardcode the mantissa factor and rely on field arithmetics to not have to make a separate code for divison, unfortunately because the result needs to be truncated, this trick doesn't work. Instead I created a big match to hardcode the operation to apply to the mantissa, this allowed me to remove the
shl
function.shr
was already optimized but we still get a 7x performance gain on some tests.I also took this as an opportunity to optimize the u256 multiplication (when I could skip the divrem and hardcode the low and high of u256). I justified all the math simplifications in comments.
Performance improvements:
Previous implementation:
New implementation: