Use unique per-node "node_counter"s rather than a node hashmap in routing

TheBlueMatt commented 3 weeks ago

During routing, we spend most of our time doing hashmap lookups. It turns out, we can drop two of them, the first requires a good bit of work - assigning each node in memory a random u32 "node counter", we can then drop the main per-node routefinding state map and replace it with a vec. Once we do that, we can also drop the first-hop hashmap lookup that we do on a per node basis as we walk the network graph, replacing it with a check in the same vec.

This is just the minimal node_id -> counter change from #2803

codecov-commenter commented 3 weeks ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 94.08100% with 19 lines in your changes missing coverage. Please review.

Project coverage is 91.43%. Comparing base (b8cdde8) to head (3654954). Report is 90 commits behind head on main.

Files	Patch %	Lines
lightning/src/routing/router.rs	95.45%	10 Missing :warning:
lightning/src/routing/gossip.rs	91.66%	3 Missing and 5 partials :warning:
lightning/src/blinded_path/mod.rs	0.00%	1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #3104 +/- ## ========================================== + Coverage 89.82% 91.43% +1.61% ========================================== Files 119 121 +2 Lines 97991 111529 +13538 Branches 97991 111529 +13538 ========================================== + Hits 88016 101972 +13956 + Misses 7400 7045 -355 + Partials 2575 2512 -63 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tnull commented 3 weeks ago

IMO, feel free to squash, if that's fine with @valentinewallace?

valentinewallace commented 2 weeks ago

Yup, feel free to squash and rebase!

TheBlueMatt commented 2 weeks ago

Rebased and squashed fixups.

valentinewallace commented 2 weeks ago

Btw, some warnings were introduced in CI

TheBlueMatt commented 2 weeks ago

Squashed fixups.

TheBlueMatt commented 1 week ago

Very unscientific benchmarks on my laptop (comparing current git with this PR rebased on git) show mean improvement between 14% and ~0. Note of course that if we further optimize the scorer (which we can do quite a bit) these numbers would look better.

Benchmarking generate_routes_with_zero_penalty_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, or reduce sample count to 80.
generate_routes_with_zero_penalty_scorer
                        time:   [55.518 ms 62.282 ms 70.580 ms]
                        change: [-9.9271% +3.7391% +20.115%] (p = 0.64 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  13 (13.00%) low mild
  4 (4.00%) high severe

Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, or reduce sample count to 80.
generate_mpp_routes_with_zero_penalty_scorer
                        time:   [58.569 ms 62.549 ms 66.635 ms]
                        change: [-23.334% -13.029% -1.7878%] (p = 0.03 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  11 (11.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking generate_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.9s, or reduce sample count to 20.
generate_routes_with_probabilistic_scorer
                        time:   [158.19 ms 163.96 ms 169.43 ms]
                        change: [-16.324% -12.044% -7.7884%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild

Benchmarking generate_mpp_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.4s, or reduce sample count to 30.
generate_mpp_routes_with_probabilistic_scorer
                        time:   [153.42 ms 162.01 ms 170.47 ms]
                        change: [-20.280% -14.764% -9.0255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 80.6s, or reduce sample count to 10.
generate_large_mpp_routes_with_probabilistic_scorer
                        time:   [649.67 ms 708.47 ms 766.42 ms]
                        change: [-16.700% -5.7015% +7.0523%] (p = 0.37 > 0.05)
                        No change in performance detected.

Benchmarking generate_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.9s, or reduce sample count to 20.
generate_routes_with_nonlinear_probabilistic_scorer
                        time:   [160.20 ms 166.87 ms 173.08 ms]
                        change: [-10.685% -5.8941% -1.0357%] (p = 0.03 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) low mild

Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.1s, or reduce sample count to 20.
Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 100 samples in estimated 17.120 s (100 iteratigenerate_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [166.54 ms 173.06 ms 179.44 ms]
                        change: [-5.5326% -0.2423% +5.7249%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild

Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 79.9s, or reduce sample count to 10.
Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 100 samples in estimated 79.915 s (100 igenerate_large_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [649.03 ms 720.50 ms 791.29 ms]
                        change: [-21.702% -9.3020% +4.8850%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

TheBlueMatt commented 1 week ago

In general, I'm still not the biggest fan of embedding these internal counters into publicly exposed datatypes that users might use differently than expected by us. If we could find a way to further encapsulate the node counter logic and leave the public types untouched, it would be preferable IMO.

I believe none of the new counters are public, otherwise I'd definitely agree with you :).

I think I'll rerun these over the weekend as they almost seem to good to be true.

50% from this alone does seem a bit high, but I wouldn't have been surprised to see that on the zero-penalty-scorer cases. The router does a ton of lookups in dist (one per heap pop, IIRC) and this change basically entirely removes those lookups. Now, the regressions that you show (and that were in my results) are fairly surprising, but I'm not all that worried anyway given no one actually uses that anyway.

lightningdevkit / rust-lightning

Use unique per-node "node_counter"s rather than a node hashmap in routing #3104

Codecov Report