Consider using Fast Fourier Transform for polynomial multiplication.

afck commented 6 years ago

We're currently doing polynomial multiplication and interpolation the naive way (also interpolation, and for a single value), which is O(n²). With Fast Fourier Transform (FFT), multiplication (but probably not interpolation, in general?) should be possible in O(n log(n)). Let's try to estimate the actual count of arithmetic operations first, to make sure it's worth doing this in practice, and for all network sizes.

vkomenda commented 6 years ago

I'll have a look if this can be done using rustfft. Alternatively, we can use an external FFT library via bindings.

afck commented 6 years ago

That would be great, but I doubt that it's possible: We need to do FFT in a finite field and most implementations seem to be written for floating point numbers only.

vkomenda commented 6 years ago

Yes. In the short term we could bind libqfft. In the long term we could write the FFT functionality in Rust.

Here is one more paper on FFT in finite fields.

afck commented 6 years ago

libqfft doesn't seem to expose a pure C API. Creating a wrapper for a C++-only library could be tricky and messy. Might be easier to just Rewrite It In Rust™.

c0gent commented 6 years ago

Let me know when when the time comes to implement this on the GPU :)

afck commented 6 years ago

…and for the instance in src/lib.rs it's not even in the field itself but in the group G2. The Fourier transform on finite groups Wikipedia article still mentions "Fast Fourier Transform", so maybe this can still be done in O(n log(n))? I have no idea whether a finite-field FFT library would help us with that, though.

Edit: Actually, G2 is a vector space over the field. Maybe the FFT algorithms for the field can be applied to it, too? (It's actually isomorphic to a finite field, of course, but multiplication is not effectively computable, I think.)

afck commented 6 years ago

I read up on it a bit. Here's my understanding so far (most of this is just from various Wikipedia articles):

Fourier Transform

Let 𝔽 be a field and 𝕍 a vector space over it, N a natural number and α ∈ 𝔽 a primitive N-th root of unity, i.e. α^k = 1 for k = N, but not for 0 < k < N. The Fourier transform 𝕍^N → 𝕍^N is the linear map defined by the matrix W, where W_i,j = α^ij. Its inverse is given by W^-1_i,j = α^-ij / N. (Note that I generalize matrix multiplication a bit here: AB can be defined in exactly the same way if the entries of A are in 𝔽, and the entries in B are in 𝕍.)

We can interpret each a ∈ 𝕍^N as the coefficients of a polynomial f_a(x) = a₀ + a₁ x + a₂ x² + … + a_{N - 1} x^{N - 1}. Then the Fourier transform can be expressed as W a = (f_a(α⁰), f_a(α¹), f_a(α²), …, f_a(α^{N - 1}))^T, and its inverse as W^-1 a = (f_a(α⁰), f_a(α^-1), f_a(α^-2), …, f_a(α^{-(N - 1)}))^T / N.

So the Fourier transform of a is the vector of values of f_a at the N-th roots of unity. And the inverse Fourier transform of a are the coefficients of the polynomial whose values at the N-th roots of unity are given by a: In that sense, W is evaluation and W^-1 is interpolation.

Fast Fourier Transform

Let W' be the Fourier transform for 2N instead of N, with some 2N-th root of unity β such that β² = α. Let c ∈ 𝕍^2N, and let a, b ∈ 𝕍^N be the even- resp. odd-numbered entries of c (starting at 0). Then (W' c)_i = f_c (βⁱ) = f_a (αⁱ) + βⁱ f_b (αⁱ) = (W a)_i%N + βⁱ (W b)_i%N. So for i < N, we have (W' c)_i = f_c (βⁱ) = (W a)_i + βⁱ (W b)_i and (W' c)_i+N = f_c (β^i+N) = (W a)_i + β^N βⁱ (W b)_i = (W a)_i - βⁱ (W b)_i, because β^N = -1 is the unique non-trivial square root of 1.

So the Fourier transform in 𝕍^2N can be computed in linear time from two Fourier transforms in 𝕍^N! That means if N is a power of 2, the Fourier transform can be computed in O(N log(N)). (But the same trick also works for factors other than 2, of course.)

The pairing crate knows for which powers N of 2 there are N-th roots of unity, and BLS12-381 was designed to have 2^s-th roots of unity for large s.

Fast Multiplication

Let f_a and f_b be polynomials with degrees deg f_a + deg f_b < N, i.e. such that there is a c ∈ 𝕍^N with f_c = f_a f_b. Naively, computing c, i.e. computing the product of the two polynomials, takes O(N²) time.

But W c is just the pointwise product of W a and W b (because these represent values of the polynomials) and computable in O(N). So (at least for large N) transforming, multiplying pointwise, and transforming back is faster than naive multiplication, namely O(N log(N))!

Our Implementation

That is already useful: Maybe we should have two kinds of structures for our polynomials (i.e. 𝔽 = 𝕍 = Fr) and commitments (which in the above sense are also polynomials: the groups, e.g. 𝕍 =G2, are actually vector spaces over 𝔽 = Fr, I think; but it might be useless because commitments can't be multiplied anyway), one in the coefficient representation and one in the value representation, i.e. the latter would be the Fourier transform of the former. The former allows evaluation and addition in constant time, the latter would allow multiplication and addition in constant time (if the bound N for the degree is known), and conversions between the two would take O(N log(N)) time.

I have no idea whether it's worth it, i.e. whether that would speed up our arithmetic for N > 5 or only for N > 100… we'll need to either try it out or at least count how many additions and multiplications the transform would actually require. And there are also other fast Fourier transform algorithms that might do better in practice than the one I described.

Interpolation in General

I don't understand yet whether and how this can be applied to interpolation in general: For threshold decryption and signatures we need to interpolate given values in other places than the N-th roots of unity. E.g. libfqfft seems to also only allow fast interpolation if the sample points are in a specific domain: roots of unity or arithmetic or geometric progressions.

afck commented 6 years ago

Key Share Computation

We should also use f(α^k) for the k-th key share, since these values can all be computed in a single Fourier transform in O(N log(N)) time, whereas computing them separately takes O(N²). (Currently it's f(k), i.e. f(k · 1_𝔽).)

However, N would always have to be a suitable number, for which a root of unity and a good FFT algorithm is known: probably a power of two. So it would be the smallest power of two greater than the number of nodes.

If we require that N is strictly greater than the number of nodes, we could also make the master key f(1) instead of f(0), so it's part of the Fourier transform, too.

afck commented 6 years ago

The bellman crate's FFT implementation follows exactly the algorithm described above. It's a bit hard to read, though: It swaps the elements in such a way that the bits of an element's index get flipped, so that in each step of the iteration, W a and W b are consecutive contiguous slices.

afck commented 6 years ago

I wrote a crude FFT implementation in the afck-fft branch and the benchmarks show that even for N = 40, at least a single polynomial multiplication is three times as fast if done naively.

It might still turn out to be worth it, of course:

Maybe for general polynomial interpolation we only have to do a single inverse Fourier transform at the end, and the multiplications before that could be sped up by being performed on PolyVals. (On the other hand, most of them don't involve two high-degree factors anyway…)
The implementation still does lots of unnecessary cloning and roots-of-unity-squaring. Would be interesting to see how a few optimizations would change the numbers.
As mentioned above, key share computation could also be sped up.

Anyway, I'm unassigning this for now; it's not a low-hanging fruit, at least.

vkomenda commented 6 years ago

For proper benchmarking, it would be required to

finish the implementation of FFT interpolation to replace the Laplacian one in https://github.com/poanetwork/threshold_crypto/blob/8820c11935cae7ee9b12b7ae25a784b0def885e3/src/poly.rs#L439 and
extend an FFT interpolation benchmark similar to https://github.com/poanetwork/threshold_crypto/blob/8820c11935cae7ee9b12b7ae25a784b0def885e3/benches/bench.rs#L61.

poanetwork / threshold_crypto