Optimize Performance - Githubissues

I've done a few experiments (including the changes in #17). This is an intermediate report.

Preliminaries

I added a branch on my Halo2 fork that prints out some useful information with regard to performance.

Here's a run with the "MNIST_small" model (models/model_28input_1024entry_2hash_2bpi.pickle.hdf5):

 === Degree overview:
  Minimum degree required by permutation argument: 3
  Minimum degree required by lookup arguments: 6
    bloom filter lookup: 5
    bit_lookup: 5
    byte_decomposition: 5
    x is byte: 5
    diff is byte: 5
    lookup: 6
  Minimum degree required by gates: 4
  Minimum degree required by circuit: 1
  Total degree: 6

----------------------------------------
create_proof Total: 29.679
  Before synthesize: 0.082 (0.3%)
  synthesize: 0.186 (0.6%)
  Compute advice and challenges: 0.388 (1.3%)
  Compute lookups: 2.652 (8.9%)
  Compute permutations: 1.869 (6.3%)
  Compute lookup products: 4.278 (14.4%)
  Compute vanishing argument: 0.604 (2.0%)
  Compute advice polys: 0.282 (1.0%)
  Compute h_poly: 12.958 (43.7%)
  Before create_proof: 3.642 (12.3%)
  create_proof: 2.732 (9.2%)
----------------------------------------

Took: 29.707698257s

You can see that:

Our degree bound is 6, and it's dominated by the range check gadget by halo2gadgets (they named their lookup lookup 👌)
With the changes of #18, circuit synthesis now only takes 0.6% of the time.
Almost half of the proving time is used to compute the quotient polynomial (which is expected I guess)

Using the `V1` floor planner instead of `SimpleFloorPlanner`

It did not reduce rows enough to reduce k (using the model_28input_256entry_1hash_1bpi.pickle.hdf5 model), but increased synthesis time. I think that might be because I didn't Circuit::without_witnesses() such that the witness is actually None, which might lead to more computation.

In summary, I think it might still be worth doing, but will only help for some models.

Reducing the number of columns

I removed the selector_accumulator column in the ByteSelectorChip (also removing some gates), which allowed me to reduce the number of columns from 6 to 5. The speed-up was 3.9%.

In summary, I think there's not much to be gained from reducing the number of columns.

Reducing the number of gates and lookups

We currently define 11 gates and 5 table lookups. According to this lecture, table lookups are at least 4-times as expensive as gates.

To get a feel how much speed-up we can expect, I went to the GreaterThanChip and replaced:

The gate that checks that the result is bit with a range-check in the existing range-check gadget.
The lookup that checks that x is a byte with a range-check in the existing range-check gadget.
The lookup that checks that diff is a byte with a range-check in the existing range-check gadget.

The speed-up was 12%.

Note that for these experiments, I did increase k from 12 to 14! The additional range-checks need many rows, so we can only realize this speed-up if we manage to keep the number of rows the same.

Reducing the degree bound

Ironically, the range-check (implemented by halo2gadgets) also has the lookup with the highest degree (6, all other gates / lookups have degree of at most 5). So, on top of the previous step (removing 1 gate and 2 lookups), I removed the range checks as well (all while keeping k constant, of course).

The speed-up (compared to the previous step) was 27%.

Summary

I'd say there are two possible areas for improvement:

Using the V1 floor planner. It will probably help for some models, and when it does help, it reduces the number of rows by a factor of two (and proving time by about as much). Would be good if it doesn't lead to slower times when it doesn't make a difference.
It's worth investigating if we can implement the range check gadget more efficiently. This includes both row efficiency (perhaps there is a design using more than one column which could be packed more compactly) and reducing the degree (which by itself should give a significant performance boost).

zkp-gravity / 0g-halo2

Optimize Performance #14

Preliminaries

Using the `V1` floor planner instead of `SimpleFloorPlanner`

Reducing the number of columns

Reducing the number of gates and lookups

Reducing the degree bound

Summary

zkp-gravity / 0g-halo2

Optimize Performance #14

Preliminaries

Using the V1 floor planner instead of SimpleFloorPlanner

Reducing the number of columns

Reducing the number of gates and lookups

Reducing the degree bound

Summary

Using the `V1` floor planner instead of `SimpleFloorPlanner`