Closed arielgabizon closed 7 years ago
Note that the performance of loading parameters from disk has changed drastically from before launch. It now seems that we should not ever load the proving key entirely into memory.
I believe we can achieve further gains by abstracting over G1/G2 with a "still serialized" representation that can be deserialized in parallel during multi-exponentiation. This is not only faster but leads to memory reduction as we can avoid jacobian Z in memory during some of the proving. In my experience the space requirements during multiexp approximate the query sizes, but we will gain for all of the boolean variables in our constraint system as we can just do mixed addition there.
Some thoughts:
Thinking more about benchmarks:
From my dog-fooding, it's often the case that multiple JS are generated sequentially for a single transaction in a way that is difficult for users to anticipate (since it depends on note selection which they cannot control). Therefore I want a benchmark that generates multiple JS sequentially. How about 4?
I predict that this change won't affect the overall time of 4 sequential JS, because latency from repeatedly loading those 5 parameter components from disk for each JS will be dwarfed by proving time. Still, let's add a benchmark here since this is a real use case (and at least one other user has complained that JS throughput prevents them from doing shielded transfers for their use case). Then let's compare that benchmark for 1.0.8 release versus this branch.
@arielgabizon can you create a PR for your branch and add the work-in-progress
label? (I just created that label to signal that a PR should not be merged yet.) That will allow that work to receive more review and get it moving forward.
@nathan-at-least I think @ebfull wanted to clean it up a little before the PR, so maybe in a few days. In particular, there's an issue that these commits contain the batch verifier updates, which we would want to merge separately at a late date. A thought for the future a out multiple JS - we could compute them in parallel, and thus pay only once for loading each part of the key
In the case where we're generating multiple JoinSplits together, we could plan all of them (see #1360), then interleave the provings, i.e. do all of the PKA first, then all of the PKB, etc., so that the proving key loading effectively only has to be done once per multijoinsplit, without increasing the memory usage.
[Edit: oops, this essentially repeats the idea in @arielgabizon's last comment (although interleaving JoinSplits is not quite the same as doing them in parallel).]
A further refinement would be to do this across batches of multijoinsplits.
Another variation would be that at the end of a multijoinsplit, you asynchronously load PKA because you know that you're going to need it for the next multijoinsplit.
Note that when we improve the performance of the circuit by reducing its size (e.g. #647, #2233, #2234, #43), the time to load the proving key parts will also scale down with the circuit size.
Related to #1966 ("Reduce memory consumption due to proving key").
Bumped from 1.0.9 due to lack of apparent PR.
Subsumed by https://github.com/zcash/zcash/pull/2243
This shouldn't have been closed. It will be implemented by #2670; see also the superseded PR #2243 for previous discussion.
@ebfull wrote:
I believe we can achieve further gains by abstracting over G1/G2 with a "still serialized" representation that can be deserialized in parallel during multi-exponentiation. This is not only faster but leads to memory reduction as we can avoid jacobian Z in memory during some of the proving. In my experience the space requirements during multiexp approximate the query sizes, but we will gain for all of the boolean variables in our constraint system as we can just do mixed addition there.
Does the Sapling prototype implement this optimisation?
Closed by #2670.
The proving key consists of five parts - PKA, PKB, PKC, PKK, PKH that are used independently and sequentially when constructing the joinsplit proof. Currently, they are all loaded in memory together. Loading each one from disk when it is needed should reduce the memory taken by the proving key to a fourth. With the help of @ebfull I've written code that already works and has generated joinsplits with about 2GB memory according to htop. The proving time increases only by a few seconds. https://github.com/arielgabizon/zcash/tree/batchverifylowmem https://github.com/arielgabizon/libsnark/tree/lowmembatchver
We'll submit a PR after @ebfull and I revise the current code.