Open georgwiese opened 1 year ago
I've done a few experiments (including the changes in #17). This is an intermediate report.
I added a branch on my Halo2 fork that prints out some useful information with regard to performance.
Here's a run with the "MNIST_small" model (models/model_28input_1024entry_2hash_2bpi.pickle.hdf5
):
=== Degree overview:
Minimum degree required by permutation argument: 3
Minimum degree required by lookup arguments: 6
bloom filter lookup: 5
bit_lookup: 5
byte_decomposition: 5
x is byte: 5
diff is byte: 5
lookup: 6
Minimum degree required by gates: 4
Minimum degree required by circuit: 1
Total degree: 6
----------------------------------------
create_proof Total: 29.679
Before synthesize: 0.082 (0.3%)
synthesize: 0.186 (0.6%)
Compute advice and challenges: 0.388 (1.3%)
Compute lookups: 2.652 (8.9%)
Compute permutations: 1.869 (6.3%)
Compute lookup products: 4.278 (14.4%)
Compute vanishing argument: 0.604 (2.0%)
Compute advice polys: 0.282 (1.0%)
Compute h_poly: 12.958 (43.7%)
Before create_proof: 3.642 (12.3%)
create_proof: 2.732 (9.2%)
----------------------------------------
Took: 29.707698257s
You can see that:
halo2gadgets
(they named their lookup lookup
👌)V1
floor planner instead of SimpleFloorPlanner
It did not reduce rows enough to reduce k
(using the model_28input_256entry_1hash_1bpi.pickle.hdf5
model), but increased synthesis time. I think that might be because I didn't Circuit::without_witnesses()
such that the witness is actually None
, which might lead to more computation.
In summary, I think it might still be worth doing, but will only help for some models.
I removed the selector_accumulator
column in the ByteSelectorChip
(also removing some gates), which allowed me to reduce the number of columns from 6 to 5. The speed-up was 3.9%.
In summary, I think there's not much to be gained from reducing the number of columns.
We currently define 11 gates and 5 table lookups. According to this lecture, table lookups are at least 4-times as expensive as gates.
To get a feel how much speed-up we can expect, I went to the GreaterThanChip
and replaced:
x
is a byte with a range-check in the existing range-check gadget.diff
is a byte with a range-check in the existing range-check gadget.The speed-up was 12%.
Note that for these experiments, I did increase k
from 12 to 14! The additional range-checks need many rows, so we can only realize this speed-up if we manage to keep the number of rows the same.
Ironically, the range-check (implemented by halo2gadgets
) also has the lookup with the highest degree (6, all other gates / lookups have degree of at most 5). So, on top of the previous step (removing 1 gate and 2 lookups), I removed the range checks as well (all while keeping k
constant, of course).
The speed-up (compared to the previous step) was 27%.
I'd say there are two possible areas for improvement:
V1
floor planner. It will probably help for some models, and when it does help, it reduces the number of rows by a factor of two (and proving time by about as much). Would be good if it doesn't lead to slower times when it doesn't make a difference.
Depends on #13
I think we should consider some basic performance optimizations, like:
V1
floor planner instead ofSimpleFloorPlanner
I think the number of rows is already pretty optimized with #12.