Problem

Modify the algorithm used to derive the shared secrets and ephemeral public keys during sphinx packet construction to achieve complexity that is linear in the number of hops. The current version uses a quadratic number of scalar multiplications during the derivation, which is potentially noticeable when sending payments, particularly on resource-constrained devices.

Solution

The primary optimization offered by this PR is to cache the intermediate product of the blinding factors, such that the i-th blinded pubkey can be generated via a single scalar multiplication, instead of iteratively applying the previous i-1 blinding factors to a given field element. This alone reduces the number of scalar multiplications to be linear in the number of hops. Note that the product is taken modulo the curve order |F(G)|, not the prime P.

We also include a second optimization, that exploits the similarities in deriving both the ephermeral and blinded pubkeys.

From Section 3.2 of the Sphinx paper, the equations for deriving the i-th ephemeral pubkey is given:

    a_0 = g^x
    a_i = g^(x * b_0 * ... * b_{i-1})

and the i-th blinded pubkey as:

    s_0 = Y_0^x
    s_i = Y_i^(x * b_0 * ... * b_{i-1}),

where x is the session private key.

Now, instead of only memoizing the product of the blinding factors, we modify the value to also include a multiplicative factor of x which can be shared across both exponentiations. This does not result in any improvement asymptotically. However, it allows us to replace the usage of scalar multiplication with more-efficient scalar base multiplication when computing the ephemeral pubkeys (the a_i's), in addition to shaving (N-1) scalar multiplications in raising each Y_i to x before applying the blinding factors separately.

Result

With the above optimizations, our final cost of computing the shared secrets becomes N*BM + (N-1)*M, where BM represents one scalar base multiplication and M represents one scalar multiplication. For N=20, this amounts to 20*BM + 19*M, compared to the existing construction which uses 209*M + BM for N=20, where the singular BM originates from computing the session pubkey. This is further summarized in the table below.

Version	Scalar Mults	Scalar Base Mults	Total (`N=20`)
Original	`(N-1)(N-2)/2 + 2*(N-1)`	`1`	`209*M + BM`
Optimized	`N-1`	`N`	`19M + 20BM`

Consulting our trusty benchmarks, we see an over 8x speedup in the time to construct an onion packet. However, constructing an onion packet includes operations beyond simply generating the shared secrets, indicating that the speedup to the shared secret derivation is greater than 8x. The timing results show the time decreasing from 36.7ms to 4.5ms. As a bonus, the newer version uses about 65% less memory and incurs 77% fewer memory allocations.

New: BenchmarkPathPacketConstruction-8            300           4545746 ns/op          157468 B/op       1703 allocs/op
Old: BenchmarkPathPacketConstruction-8             50          36726800 ns/op          444977 B/op       7350 allocs/op

lightningnetwork / lightning-onion

Optimization: Linear Speedup in Sphinx Shared-Secret Construction #18

Problem

Solution

Result