Modify the algorithm used to derive the shared secrets and ephemeral public keys during sphinx packet construction to achieve complexity that is linear in the number of hops. The current version uses a quadratic number of scalar multiplications during the derivation, which is potentially noticeable when sending payments, particularly on resource-constrained devices.
Solution
The primary optimization offered by this PR is to cache the intermediate product of the blinding factors, such that the i-th blinded pubkey can be generated via a single scalar multiplication, instead of iteratively applying the previous i-1 blinding factors to a given field element. This alone reduces the number of scalar multiplications to be linear in the number of hops. Note that the product is taken modulo the curve order |F(G)|, not the prime P.
We also include a second optimization, that exploits the similarities in deriving both the ephermeral and blinded pubkeys.
From Section 3.2 of the Sphinx paper, the equations for deriving the i-th ephemeral pubkey is given:
a_0 = g^x
a_i = g^(x * b_0 * ... * b_{i-1})
and the i-th blinded pubkey as:
s_0 = Y_0^x
s_i = Y_i^(x * b_0 * ... * b_{i-1}),
where x is the session private key.
Now, instead of only memoizing the product of the blinding factors, we modify the value to also include a multiplicative factor of x which can be shared across both exponentiations. This does not result in any improvement asymptotically. However, it allows us to replace the usage of scalar multiplication with more-efficient scalar base multiplication when computing the ephemeral pubkeys (the a_i's), in addition to shaving (N-1) scalar multiplications in raising each Y_i to x before applying the blinding factors separately.
Result
With the above optimizations, our final cost of computing the shared secrets becomes N*BM + (N-1)*M, where BM represents one scalar base multiplication and M represents one scalar multiplication. For N=20, this amounts to 20*BM + 19*M, compared to the existing construction which uses 209*M + BM for N=20, where the singular BM originates from computing the session pubkey. This is further summarized in the table below.
Version
Scalar Mults
Scalar Base Mults
Total (N=20)
Original
(N-1)(N-2)/2 + 2*(N-1)
1
209*M + BM
Optimized
N-1
N
19*M + 20*BM
Consulting our trusty benchmarks, we see an over 8x speedup in the time to construct an onion packet. However, constructing an onion packet includes operations beyond simply generating the shared secrets, indicating that the speedup to the shared secret derivation is greater than 8x. The timing results show the time decreasing from 36.7ms to 4.5ms. As a bonus, the newer version uses about 65% less memory and incurs 77% fewer memory allocations.
Problem
Modify the algorithm used to derive the shared secrets and ephemeral public keys during sphinx packet construction to achieve complexity that is linear in the number of hops. The current version uses a quadratic number of scalar multiplications during the derivation, which is potentially noticeable when sending payments, particularly on resource-constrained devices.
Solution
The primary optimization offered by this PR is to cache the intermediate product of the blinding factors, such that the i-th blinded pubkey can be generated via a single scalar multiplication, instead of iteratively applying the previous i-1 blinding factors to a given field element. This alone reduces the number of scalar multiplications to be linear in the number of hops. Note that the product is taken modulo the curve order
|F(G)|
, not the primeP
.We also include a second optimization, that exploits the similarities in deriving both the ephermeral and blinded pubkeys.
From Section 3.2 of the Sphinx paper, the equations for deriving the i-th ephemeral pubkey is given:
and the i-th blinded pubkey as:
where
x
is the session private key.Now, instead of only memoizing the product of the blinding factors, we modify the value to also include a multiplicative factor of
x
which can be shared across both exponentiations. This does not result in any improvement asymptotically. However, it allows us to replace the usage of scalar multiplication with more-efficient scalar base multiplication when computing the ephemeral pubkeys (thea_i
's), in addition to shaving(N-1)
scalar multiplications in raising eachY_i
tox
before applying the blinding factors separately.Result
With the above optimizations, our final cost of computing the shared secrets becomes
N*BM + (N-1)*M
, whereBM
represents one scalar base multiplication andM
represents one scalar multiplication. ForN=20
, this amounts to20*BM + 19*M
, compared to the existing construction which uses209*M + BM
forN=20
, where the singularBM
originates from computing the session pubkey. This is further summarized in the table below.N=20
)(N-1)(N-2)/2 + 2*(N-1)
1
209*M + BM
N-1
N
19*M + 20*BM
Consulting our trusty benchmarks, we see an over 8x speedup in the time to construct an onion packet. However, constructing an onion packet includes operations beyond simply generating the shared secrets, indicating that the speedup to the shared secret derivation is greater than 8x. The timing results show the time decreasing from 36.7ms to 4.5ms. As a bonus, the newer version uses about 65% less memory and incurs 77% fewer memory allocations.