Stream the proving key to reduce memory usage

arielgabizon commented 7 years ago

The proving key consists of five parts - PK_A, PK_B, PK_C, PK_K, PK_H that are used independently and sequentially when constructing the joinsplit proof. Currently, they are all loaded in memory together. Loading each one from disk when it is needed should reduce the memory taken by the proving key to a fourth. With the help of @ebfull I've written code that already works and has generated joinsplits with about 2GB memory according to htop. The proving time increases only by a few seconds. https://github.com/arielgabizon/zcash/tree/batchverifylowmem https://github.com/arielgabizon/libsnark/tree/lowmembatchver

We'll submit a PR after @ebfull and I revise the current code.

ebfull commented 7 years ago

Note that the performance of loading parameters from disk has changed drastically from before launch. It now seems that we should not ever load the proving key entirely into memory.

I believe we can achieve further gains by abstracting over G1/G2 with a "still serialized" representation that can be deserialized in parallel during multi-exponentiation. This is not only faster but leads to memory reduction as we can avoid jacobian Z in memory during some of the proving. In my experience the space requirements during multiexp approximate the query sizes, but we will gain for all of the boolean variables in our constraint system as we can just do mixed addition there.

nathan-at-least commented 7 years ago

Some thoughts:

let's do benchmarks first.
I want to do PR review on this.

nathan-at-least commented 7 years ago

Thinking more about benchmarks:

From my dog-fooding, it's often the case that multiple JS are generated sequentially for a single transaction in a way that is difficult for users to anticipate (since it depends on note selection which they cannot control). Therefore I want a benchmark that generates multiple JS sequentially. How about 4?

I predict that this change won't affect the overall time of 4 sequential JS, because latency from repeatedly loading those 5 parameter components from disk for each JS will be dwarfed by proving time. Still, let's add a benchmark here since this is a real use case (and at least one other user has complained that JS throughput prevents them from doing shielded transfers for their use case). Then let's compare that benchmark for 1.0.8 release versus this branch.

nathan-at-least commented 7 years ago

@arielgabizon can you create a PR for your branch and add the work-in-progress label? (I just created that label to signal that a PR should not be merged yet.) That will allow that work to receive more review and get it moving forward.

arielgabizon commented 7 years ago

@nathan-at-least I think @ebfull wanted to clean it up a little before the PR, so maybe in a few days. In particular, there's an issue that these commits contain the batch verifier updates, which we would want to merge separately at a late date. A thought for the future a out multiple JS - we could compute them in parallel, and thus pay only once for loading each part of the key

daira commented 7 years ago

In the case where we're generating multiple JoinSplits together, we could plan all of them (see #1360), then interleave the provings, i.e. do all of the PK_A first, then all of the PK_B, etc., so that the proving key loading effectively only has to be done once per multijoinsplit, without increasing the memory usage.

[Edit: oops, this essentially repeats the idea in @arielgabizon's last comment (although interleaving JoinSplits is not quite the same as doing them in parallel).]

A further refinement would be to do this across batches of multijoinsplits.

daira commented 7 years ago

Another variation would be that at the end of a multijoinsplit, you asynchronously load PK_A because you know that you're going to need it for the next multijoinsplit.

daira commented 7 years ago

Note that when we improve the performance of the circuit by reducing its size (e.g. #647, #2233, #2234, #43), the time to load the proving key parts will also scale down with the circuit size.

tromer commented 7 years ago

Related to #1966 ("Reduce memory consumption due to proving key").

nathan-at-least commented 7 years ago

Bumped from 1.0.9 due to lack of apparent PR.

arielgabizon commented 7 years ago

Subsumed by https://github.com/zcash/zcash/pull/2243

daira commented 7 years ago

This shouldn't have been closed. It will be implemented by #2670; see also the superseded PR #2243 for previous discussion.

daira commented 7 years ago

@ebfull wrote:

I believe we can achieve further gains by abstracting over G1/G2 with a "still serialized" representation that can be deserialized in parallel during multi-exponentiation. This is not only faster but leads to memory reduction as we can avoid jacobian Z in memory during some of the proving. In my experience the space requirements during multiexp approximate the query sizes, but we will gain for all of the boolean variables in our constraint system as we can just do mixed addition there.

Does the Sapling prototype implement this optimisation?

str4d commented 7 years ago

Closed by #2670.

zcash / zcash

Stream the proving key to reduce memory usage #2235