Open r3ld3v opened 2 years ago
In terms of assembly optimizations for pasta field arithmetic
, it's necessary to update rust version.
The asm
feature is stable latter than 1.59.0.
If it's okay to update, I would implement asm as following and it's almost 20% speed up.
https://github.com/appliedzkp/pairing/pull/5
I analysed the functions efficiency of create_proof
.
https://github.com/appliedzkp/halo2/pull/36#issuecomment-1085418454
I am going to work on h() evaluation.
This allows us to bottom up n / 2 but it's hard to implement this with recursive.
The &[G]
array divide into 4 parts and join
can take double args.
I tried but failed.
It's necessary to do experiment of the relation between overhead and degree.
It's easy but becomes complicated.
Perform the multiplication on last round of butterfly arithmetic to avoid extra iteration.
I think the following is interesting. https://github.com/zkcrypto/bls12_381/pull/84 We can reduce Montgomery reduction for each field array arithmetic. This affects so much.
Do you have an idea how to replace with?
I think the following is interesting. zkcrypto/bls12_381#84 We can reduce Montgomery reduction for each field array arithmetic. This affects so much.
* [eval_poly](https://github.com/zcash/halo2/blob/main/halo2_proofs/src/arithmetic.rs#L269) * [compute_inner_product](https://github.com/zcash/halo2/blob/main/halo2_proofs/src/arithmetic.rs#L279) * [ifft_divise](https://github.com/zcash/halo2/blob/main/halo2_proofs/src/poly/domain.rs#L355)
Do you have an idea how to replace with?
Yep, I was thinking about this myself yesterday. The inner product function we use in Halo 2 is precisely the "sum of products" I implemented there, and so I'm currently working to expose this via the ff
crate for generic usage (https://github.com/zkcrypto/ff/issues/79). It can have a slow provided implementation in Field
, that concrete fields override with efficient limb-aware implementations.
In terms of
assembly optimizations for pasta field arithmetic
, it's necessary to update rust version. Theasm
feature is stable latter than 1.59.0. If it's okay to update, I would implement asm as following and it's almost 20% speed up. appliedzkp/pairing#5
I do not want to use raw assembly in any of our field implementations if I can avoid it. Instead, we should look at that assembly and figure out why Rust / LLVM is not generating it, and rework the Rust code to get closer. I also want to implement some proper field arithmetic backends in the pasta_curves
crate so we can use vector intrinsics (the x86 / x86_64 intrinsics have been stable for a while, and aarch64 intrinsics were stabilised in 1.59). Once we have backends available, then if it were impossible to get the same performance as with raw assembly, we could conceivably have a feature-flagged asm
backend for those who really want it.
Hi @str4d Thank you for the answer.
Yep, I was thinking about this myself yesterday. The inner product function we use in Halo 2 is precisely the "sum of products" I implemented there, and so I'm currently working to expose this via the ff crate for generic usage (https://github.com/zkcrypto/ff/issues/79). It can have a slow provided implementation in Field, that concrete fields override with efficient limb-aware implementations
This is amazing.
I think this logic can be applied for [F] ⚪︎ [F]
or F ⚪︎ [F]
operation for any arithmetic ⚪︎
as long as they keep equivalence relation.
It reduces n - 1 times calculation of reduce
method for additive
and, reduce
and montgomery reduction
for multiplicative
.
There are many functions which can be applied this arithmetic.
Do you have a plan to modularize in ff
and apply to halo2?
I do not want to use raw assembly in any of our field implementations if I can avoid it
Copy that.
I think this logic can be applied for
[F] ⚪︎ [F]
orF ⚪︎ [F]
operation for any arithmetic⚪︎
as long as they keep equivalence relation. It reduces n - 1 times calculation ofreduce
method foradditive
and,reduce
andmontgomery reduction
formultiplicative
. There are many functions which can be applied this arithmetic. Do you have a plan to modularize inff
and apply to halo2?
The interleaved reductions likely won't have any benefit for additions, because the main speedup is due to reduced register pressure by not maintaining double-width field elements, which can be produced by multiplication but not addition. Also, additions don't use Montgomery reduction.
The sharing of reductions between accumulated ⚪︎
operations is interesting, but I'm not sure there's a good general API for that. In the additive case specifically, [F] + [F]
is just sum([F].concat([F])
; there is no Montgomery reduction, so the only thing we could batch is the conditional subtraction of the modulus, which would require storing an additional limb. For now, I'm inclined to just add Field::sum_of_products
and then see what that gets us. If we find other operations that are common enough to justify, then we can add APIs for those (and then look at how we could perhaps generalise once we have the concrete APIs in place).
the main speedup is due to reduced register pressure by not maintaining double-width field elements
Nice.
If we find other operations that are common enough to justify, then we can add APIs for those (and then look at how we could perhaps generalise once we have the concrete APIs in place).
Thank you.
I understand.
I think it's reasonable to implement poly crate for example zkcrypto/poly
.
[F] + [F]
is nearer to poly arithmetic than field arithmetic, it should use parallelize and it's easier to optimize.
The poly arithmetic is performed frequently.
I am going to check how sharing of reduction
improves performance.
memo: par_iter vs normal https://github.com/rayon-rs/rayon/issues/648#issuecomment-476733073
Find proper task size and methods for core, thread number and memory.
Create condition branch for each environment using thread_num
and task_size
as following.
/// multiplication = 3 point, addition = 1 point
enum TaskSize {
// less than 3 point
Small,
// more than 4 point and less than 8 point
Middle,
// more than 9 point
Large,
}
fn task_chunk(task_size: TaskSize) -> u32 {
let chunk = match task_size {
TaskSize::Small => 8,
TaskSize::Middle => 4,
TaskSize::Large => 2,
};
chunk / log_threads()
}
one time field arithmetic iteration Add | k | normal | par_iter | parallel |
---|---|---|---|---|
3 | 36.614 ns 36.616 ns 36.617 ns | 8.5748 us 8.7767 us 8.9484 us | 8.2590 us 8.3402 us 8.4133 us | |
4 | 69.030 ns 69.110 ns 69.231 ns | 9.6449 us 9.7224 us 9.8196 us | 7.5027 us 7.7373 us 7.9515 us | |
5 | 146.63 ns 146.79 ns 146.97 ns | 10.100 us 10.189 us 10.305 us | 7.2004 us 7.4858 us 7.8044 us | |
6 | 274.81 ns 274.83 ns 274.87 ns | 11.191 us 11.256 us 11.351 us | 7.9772 us 8.3716 us 8.7483 us | |
7 | 558.26 ns 558.31 ns 558.40 ns | 13.600 us 13.662 us 13.730 us | 9.2109 us 9.3365 us 9.4365 us | |
8 | 1.1079 us 1.1079 us 1.1080 us | 15.481 us 15.524 us 15.569 us | 10.029 us 10.088 us 10.139 us | |
9 | 2.2102 us 2.2104 us 2.2106 us | 17.009 us 17.036 us 17.062 us | 11.506 us 11.539 us 11.574 us | |
10 | 4.4262 us 4.4263 us 4.4266 us | 18.809 us 18.837 us 18.861 us | 12.331 us 12.356 us 12.382 us | |
11 | 8.8253 us 8.8256 us 8.8259 us | 21.242 us 21.279 us 21.326 us | 14.405 us 14.444 us 14.494 us | |
12 | 17.644 us 17.645 us 17.647 us | 21.693 us 21.821 us 21.971 us | 18.548 us 18.569 us 18.592 us | |
13 | 35.285 us 35.294 us 35.305 us | 32.041 us 32.153 us 32.291 us | 26.574 us 26.614 us 26.659 us | |
14 | 70.567 us 70.581 us 70.605 us | 53.141 us 53.271 us 53.445 us | 43.057 us 44.095 us 45.461 us | |
15 | 141.93 us 141.93 us 141.94 us | 95.057 us 95.192 us 95.343 us | 75.803 us 76.210 us 76.766 us | |
16 | 282.98 us 283.03 us 283.08 us | 181.69 us 182.11 us 182.64 us | 144.24 us 144.79 us 145.59 us | |
17 | 569.23 us 569.49 us 569.83 us | 351.35 us 352.24 us 353.44 us | 276.42 us 279.55 us 284.34 us | |
18 | 1.1501 ms 1.1513 ms 1.1529 ms | 687.74 us 689.02 us 690.59 us | 542.22 us 543.47 us 544.96 us |
Mul | k | normal | par_iter | parallel |
---|---|---|---|---|
3 | 227.36 ns 227.38 ns 227.40 ns | 9.0132 us 9.1790 us 9.3149 us | 8.6094 us 8.8608 us 9.0674 us | |
4 | 454.44 ns 454.63 ns 454.89 ns | 10.743 us 10.816 us 10.893 us | 9.3778 us 9.4564 us 9.5292 us | |
5 | 918.76 ns 918.85 ns 918.97 ns | 12.749 us 12.851 us 13.017 us | 10.056 us 10.117 us 10.184 us | |
6 | 1.8231 us 1.8233 us 1.8234 us | 14.813 us 14.849 us 14.886 us | 11.366 us 11.428 us 11.510 us | |
7 | 3.6549 us 3.6554 us 3.6558 us | 16.249 us 16.289 us 16.326 us | 12.558 us 12.579 us 12.599 us | |
8 | 7.3003 us 7.3008 us 7.3016 us | 18.171 us 18.199 us 18.233 us | 14.454 us 14.474 us 14.499 us | |
9 | 14.637 us 14.638 us 14.640 us | 21.575 us 21.644 us 21.719 us | 18.658 us 18.687 us 18.723 us | |
10 | 29.251 us 29.253 us 29.256 us | 26.149 us 26.234 us 26.347 us | 26.632 us 26.663 us 26.699 us | |
11 | 58.909 us 58.914 us 58.919 us | 40.919 us 40.985 us 41.073 us | 43.039 us 43.164 us 43.326 us | |
12 | 117.64 us 117.66 us 117.68 us | 71.681 us 71.912 us 72.278 us | 75.645 us 75.873 us 76.141 us | |
13 | 235.37 us 235.40 us 235.44 us | 132.30 us 132.60 us 132.96 us | 140.63 us 141.02 us 141.55 us | |
14 | 471.41 us 471.47 us 471.55 us | 253.99 us 254.66 us 255.57 us | 272.04 us 273.20 us 274.69 us | |
15 | 943.76 us 943.87 us 943.99 us | 493.60 us 494.64 us 496.05 us | 530.97 us 532.40 us 534.24 us | |
16 | 1.8840 ms 1.8842 ms 1.8845 ms | 977.83 us 980.45 us 983.85 us | 1.0663 ms 1.0683 ms 1.0706 ms | |
17 | 3.7703 ms 3.7711 ms 3.7718 ms | 1.9412 ms 1.9449 ms 1.9495 ms | 2.1186 ms 2.1261 ms 2.1351 ms | |
18 | 7.5475 ms 7.5495 ms 7.5517 ms | 3.9131 ms 3.9256 ms 3.9398 ms | 4.2483 ms 4.2646 ms 4.2856 ms |
Three Mul | k | normal | par_iter | parallel |
---|---|---|---|---|
3 | 740.03 ns 740.06 ns 740.12 ns | 10.245 us 10.278 us 10.313 us | 9.8069 us 9.8670 us 9.9196 us | |
4 | 1.4876 us 1.4877 us 1.4880 us | 13.047 us 13.062 us 13.079 us | 11.137 us 11.218 us 11.334 us | |
5 | 2.9867 us 2.9870 us 2.9874 us | 14.695 us 14.729 us 14.768 us | 12.249 us 12.342 us 12.473 us | |
6 | 6.0087 us 6.0097 us 6.0114 us | 16.122 us 16.150 us 16.192 us | 14.111 us 14.152 us 14.203 us | |
7 | 11.918 us 11.921 us 11.925 us | 18.890 us 18.977 us 19.079 us | 17.658 us 17.685 us 17.717 us | |
8 | 23.729 us 23.730 us 23.732 us | 23.109 us 23.144 us 23.197 us | 24.855 us 24.884 us 24.917 us | |
9 | 47.429 us 47.434 us 47.441 us | 35.143 us 35.267 us 35.459 us | 39.047 us 39.091 us 39.146 us | |
10 | 95.058 us 95.065 us 95.076 us | 59.381 us 59.530 us 59.748 us | 67.460 us 67.677 us 67.957 us | |
11 | 192.36 us 192.36 us 192.37 us | 107.73 us 107.95 us 108.29 us | 124.63 us 125.47 us 126.66 us | |
12 | 384.87 us 384.89 us 384.92 us | 206.87 us 208.82 us 211.66 us | 240.87 us 241.48 us 242.20 us | |
13 | 769.35 us 769.39 us 769.44 us | 401.49 us 402.64 us 404.07 us | 470.14 us 471.27 us 472.80 us | |
14 | 1.5384 ms 1.5386 ms 1.5388 ms | 788.91 us 790.20 us 791.79 us | 927.12 us 928.16 us 929.33 us | |
15 | 3.1031 ms 3.1034 ms 3.1039 ms | 1.5759 ms 1.5776 ms 1.5796 ms | 1.8491 ms 1.8517 ms 1.8549 ms | |
16 | 6.2126 ms 6.2131 ms 6.2136 ms | 3.1819 ms 3.1865 ms 3.1911 ms | 3.7340 ms 3.7428 ms 3.7531 ms | |
17 | 12.453 ms 12.456 ms 12.459 ms | 6.3751 ms 6.3899 ms 6.4062 ms | 7.4952 ms 7.5719 ms 7.6787 ms | |
18 | 24.969 ms 24.975 ms 24.981 ms | 12.789 ms 12.827 ms 12.870 ms | 14.962 ms 14.987 ms 15.014 ms |
Three Add | k | normal | par_iter | parallel |
---|---|---|---|---|
3 | 111.25 ns 111.26 ns 111.26 ns | 8.9307 us 9.1100 us 9.2478 us | 6.8630 us 6.9690 us 7.0764 us | |
4 | 221.65 ns 221.67 ns 221.70 ns | 10.531 us 10.599 us 10.680 us | 8.3465 us 8.6197 us 8.8761 us | |
5 | 443.69 ns 443.72 ns 443.75 ns | 11.817 us 11.896 us 11.985 us | 9.1735 us 9.2792 us 9.3672 us | |
6 | 885.67 ns 885.76 ns 885.90 ns | 13.417 us 13.505 us 13.598 us | 9.8654 us 9.9676 us 10.067 us | |
7 | 1.7744 us 1.7746 us 1.7748 us | 15.675 us 15.703 us 15.728 us | 11.240 us 11.273 us 11.312 us | |
8 | 3.5424 us 3.5434 us 3.5451 us | 16.768 us 16.794 us 16.819 us | 12.511 us 12.561 us 12.635 us | |
9 | 6.9358 us 6.9374 us 6.9399 us | 18.731 us 18.754 us 18.776 us | 14.208 us 14.283 us 14.409 us | |
10 | 14.198 us 14.200 us 14.201 us | 20.764 us 21.045 us 21.290 us | 17.782 us 17.821 us 17.865 us | |
11 | 28.315 us 28.319 us 28.324 us | 28.049 us 28.104 us 28.176 us | 25.271 us 25.326 us 25.409 us | |
12 | 56.602 us 56.605 us 56.609 us | 45.301 us 45.395 us 45.524 us | 40.283 us 40.412 us 40.582 us | |
13 | 113.18 us 113.19 us 113.21 us | 79.458 us 79.737 us 80.174 us | 69.654 us 69.769 us 69.878 us | |
14 | 226.36 us 226.38 us 226.40 us | 147.58 us 147.84 us 148.15 us | 129.21 us 129.45 us 129.74 us | |
15 | 452.97 us 453.01 us 453.07 us | 284.25 us 285.09 us 286.38 us | 248.64 us 249.59 us 250.81 us | |
16 | 906.45 us 906.52 us 906.60 us | 559.63 us 561.53 us 564.16 us | 485.47 us 487.52 us 490.65 us | |
17 | 1.8145 ms 1.8147 ms 1.8150 ms | 1.1069 ms 1.1092 ms 1.1120 ms | 957.72 us 959.49 us 961.72 us | |
18 | 3.6520 ms 3.6529 ms 3.6540 ms | 2.2157 ms 2.2272 ms 2.2444 ms | 1.9031 ms 1.9068 ms 1.9114 ms |
Mac
Add | k | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 8.4865 us 8.7383 us 8.9984 us | 7.6687 us 8.1307 us 8.6264 us | 49.894 ns 50.056 ns 50.264 ns |
||||||||||||
4 | 10.804 us 10.841 us 10.878 us | 8.5535 us 8.6013 us 8.6544 us | 8.1329 us 8.5694 us 9.0234 us | 79.834 ns 80.323 ns 80.943 ns |
|||||||||||
5 | 17.365 us 18.353 us 19.542 us | 11.666 us 12.372 us 13.345 us | 9.0639 us 9.3125 us 9.6756 us | 7.7428 us 8.1249 us 8.5304 us | 135.33 ns 135.54 ns 135.79 ns |
||||||||||
6 | 24.028 us 24.127 us 24.234 us | 17.746 us 18.126 us 18.658 us | 12.443 us 12.695 us 13.119 us | 9.6657 us 9.7400 us 9.8262 us | 8.3629 us 8.6792 us 9.0522 us | 273.53 ns 277.22 ns 281.66 ns |
|||||||||
7 | 35.130 us 35.498 us 35.926 us | 26.501 us 26.679 us 26.875 us | 18.959 us 19.016 us 19.075 us | 14.307 us 14.394 us 14.484 us | 11.646 us 11.726 us 11.815 us | 9.1532 us 9.4182 us 9.7456 us | 481.41 ns 481.88 ns 482.42 ns |
||||||||
8 | 40.231 us 40.861 us 41.553 us | 35.265 us 36.861 us 38.881 us | 27.771 us 27.876 us 27.987 us | 22.508 us 22.734 us 23.001 us | 18.404 us 18.509 us 18.644 us | 15.065 us 15.193 us 15.333 us | 11.892 us 11.978 us 12.065 us | 1.0040 us 1.0237 us 1.0510 us |
|||||||
9 | 41.950 us 42.430 us 42.989 us | 37.774 us 38.013 us 38.288 us | 34.655 us 34.976 us 35.353 us | 28.869 us 28.979 us 29.100 us | 26.027 us 26.172 us 26.334 us | 22.043 us 22.285 us 22.533 us | 19.511 us 19.763 us 20.026 us | 14.208 us 14.354 us 14.491 us | 1.8647 us 1.8722 us 1.8802 us |
||||||
10 | 55.450 us 56.708 us 58.059 us | 44.576 us 48.146 us 52.999 us | 42.094 us 43.535 us 45.845 us | 41.633 us 43.171 us 45.192 us | 37.792 us 40.577 us 43.652 us | 30.572 us 31.031 us 31.491 us | 24.981 us 25.107 us 25.243 us | 20.775 us 20.922 us 21.082 us | 16.728 us 16.861 us 16.986 us | 3.6868 us 3.6906 us 3.6948 us |
|||||
11 | 57.630 us 58.260 us 58.976 us | 53.438 us 54.222 us 54.981 us | 51.131 us 52.458 us 54.095 us | 43.672 us 44.314 us 45.022 us | 38.095 us 38.238 us 38.398 us | 34.721 us 34.935 us 35.188 us | 29.845 us 30.088 us 30.390 us | 26.737 us 26.882 us 27.063 us | 24.126 us 24.421 us 24.772 us | 19.853 us 20.027 us 20.225 us | 7.3577 us 7.3707 us 7.3843 us |
||||
12 | 74.006 us 82.503 us 96.651 us | 60.490 us 61.703 us 63.196 us | 53.367 us 54.647 us 56.422 us | 48.560 us 49.146 us 49.862 us | 47.054 us 48.111 us 49.286 us | 41.027 us 41.742 us 42.582 us | 40.344 us 41.569 us 43.145 us | 33.403 us 33.867 us 34.372 us | 32.113 us 32.943 us 33.895 us | 33.725 us 36.196 us 39.246 us | 26.128 us 26.535 us 26.968 us | 14.860 us 14.924 us 15.007 us |
|||
13 | 76.729 us 78.650 us 81.140 us | 80.344 us 85.142 us 90.723 us | 74.581 us 76.099 us 77.818 us | 70.706 us 74.664 us 79.865 us | 59.432 us 61.664 us 64.488 us | 51.299 us 52.632 us 54.048 us | 43.822 us 44.417 us 45.068 us | 45.089 us 46.367 us 47.964 us | 37.537 us 38.859 us 40.761 us | 36.239 us 36.815 us 37.442 us | 34.318 us 35.158 us 36.153 us | 36.640 us 38.085 us 40.029 us | 31.645 us 31.850 us 32.090 us |
||
14 | 96.109 us 99.058 us 103.02 us | 82.915 us 85.327 us 88.338 us | 74.106 us 75.766 us 77.721 us | 65.367 us 66.130 us 67.086 us | 67.780 us 69.117 us 70.520 us | 65.359 us 66.814 us 68.360 us | 65.653 us 68.949 us 72.797 us | 64.046 us 68.780 us 74.290 us | 52.229 us 59.241 us 68.364 us | 43.430 us 45.169 us 47.105 us | 38.416 us 39.108 us 39.886 us |
46.860 us 49.930 us 53.646 us | 51.370 us 57.463 us 67.220 us | 60.318 us 60.631 us 61.072 us | |
15 | 112.70 us 115.39 us 118.70 us | 108.91 us 112.34 us 116.15 us | 111.99 us 118.11 us 125.32 us | 89.571 us 93.145 us 97.821 us | 99.211 us 105.54 us 113.43 us | 87.888 us 91.208 us 95.005 us | 72.750 us 74.216 us 75.886 us | 67.757 us 69.687 us 71.764 us | 71.626 us 78.614 us 90.932 us | 63.923 us 66.347 us 68.898 us | 69.236 us 74.135 us 80.049 us | 67.671 us 72.314 us 78.030 us | 55.197 us 57.540 us 60.155 us |
82.056 us 85.622 us 90.324 us | 121.06 us 121.57 us 122.20 us |
Github Actions Add | k | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 8.6546 us 8.8228 us 8.9699 us | 8.3995 us 8.6312 us 8.8324 us | 53.093 ns 53.124 ns 53.153 ns | ||||||||||||
4 | 8.8763 us 9.0172 us 9.0935 us | 8.7933 us 8.9154 us 9.0024 us | 8.6157 us 8.7592 us 8.8718 us | 88.424 ns 88.434 ns 88.447 ns |
|||||||||||
5 | 9.7176 us 9.7890 us 9.8535 us | 9.0014 us 9.0847 us 9.1433 us | 9.0339 us 9.0610 us 9.0822 us | 8.7167 us 8.8259 us 8.8992 us | 157.66 ns 157.68 ns 157.70 ns |
||||||||||
6 | 9.9716 us 10.003 us 10.034 us | 9.8813 us 9.9691 us 10.070 us | 9.1202 us 9.2548 us 9.3663 us | 9.0368 us 9.0608 us 9.0907 us | 8.6307 us 8.8124 us 8.9441 us | 297.08 ns 297.10 ns 297.12 ns |
|||||||||
7 | 11.683 us 11.753 us 11.825 us | 10.635 us 10.665 us 10.697 us | 10.445 us 10.482 us 10.519 us | 9.6826 us 9.7893 us 9.9166 us | 9.2953 us 9.4191 us 9.5374 us | 8.6287 us 8.8341 us 9.0342 us | 583.53 ns 583.60 ns 583.67 ns |
||||||||
8 | 15.075 us 15.093 us 15.110 us | 13.777 us 13.812 us 13.845 us | 12.781 us 12.816 us 12.851 us | 12.175 us 12.210 us 12.262 us | 10.641 us 10.668 us 10.701 us | 10.057 us 10.084 us 10.113 us | 9.8548 us 9.9329 us 10.009 us | 1.1400 us 1.1402 us 1.1404 us |
|||||||
9 | 16.341 us 16.368 us 16.398 us | 15.089 us 15.147 us 15.212 us | 14.532 us 14.584 us 14.642 us | 13.957 us 14.031 us 14.159 us | 13.308 us 13.364 us 13.432 us | 12.472 us 12.520 us 12.564 us | 11.970 us 12.009 us 12.052 us | 11.264 us 11.291 us 11.317 us | 2.2537 us 2.2539 us 2.2543 us |
||||||
10 | 18.331 us 18.403 us 18.504 us | 17.327 us 17.363 us 17.398 us | 16.829 us 16.862 us 16.903 us | 16.357 us 16.404 us 16.455 us | 15.387 us 15.421 us 15.452 us | 15.018 us 15.048 us 15.082 us | 14.510 us 14.552 us 14.599 us | 13.661 us 13.689 us 13.715 us | 13.258 us 13.306 us 13.354 us | 4.4917 us 4.4922 us 4.4929 us |
|||||
11 | 21.282 us 21.324 us 21.371 us | 20.085 us 20.131 us 20.176 us | 19.365 us 19.445 us 19.536 us | 18.984 us 19.023 us 19.069 us | 18.039 us 18.070 us 18.098 us | 17.562 us 17.605 us 17.657 us | 17.500 us 17.955 us 18.440 us | 16.143 us 16.244 us 16.402 us | 15.453 us 15.490 us 15.530 us | 15.397 us 15.437 us 15.478 us | 8.9466 us 8.9474 us 8.9483 us |
||||
12 | 24.029 us 24.103 us 24.180 us | 24.317 us 24.483 us 24.686 us | 23.475 us 23.547 us 23.622 us | 23.031 us 23.082 us 23.127 us | 22.150 us 22.211 us 22.275 us | 22.000 us 22.047 us 22.095 us | 21.733 us 21.795 us 21.877 us | 21.030 us 21.096 us 21.179 us | 20.199 us 20.332 us 20.578 us | 20.032 us 20.094 us 20.171 us | 19.908 us 19.949 us 20.002 us | 17.864 us 17.866 us 17.867 us |
|||
13 | 35.482 us 35.561 us 35.669 us | 32.658 us 32.715 us 32.792 us | 31.259 us 31.329 us 31.417 us | 30.324 us 30.387 us 30.470 us | 30.093 us 30.197 us 30.368 us | 29.818 us 29.857 us 29.896 us | 30.086 us 30.138 us 30.202 us | 29.723 us 29.827 us 29.983 us | 29.879 us 29.943 us 30.018 us | 29.552 us 29.583 us 29.616 us | 29.390 us 29.543 us 29.760 us | 29.072 us 29.109 us 29.147 us |
35.701 us 35.703 us 35.705 us | ||
14 | 59.379 us 59.476 us 59.589 us | 53.713 us 53.901 us 54.181 us | 50.974 us 51.068 us 51.173 us | 49.523 us 49.660 us 49.843 us | 48.536 us 48.676 us 48.859 us | 48.324 us 48.509 us 48.743 us | 48.409 us 48.493 us 48.588 us | 48.309 us 48.394 us 48.493 us | 48.247 us 48.439 us 48.739 us | 48.175 us 48.270 us 48.396 us | 47.640 us 47.850 us 48.151 us | 47.383 us 47.524 us 47.700 us | 47.289 us 47.374 us 47.474 us |
71.345 us 71.348 us 71.352 us | |
15 | 107.67 us 107.84 us 108.03 us | 95.682 us 96.364 us 97.592 us | 90.079 us 90.224 us 90.388 us | 87.090 us 87.324 us 87.616 us | 85.794 us 86.112 us 86.499 us | 85.090 us 85.281 us 85.510 us | 85.549 us 85.682 us 85.829 us | 84.796 us 85.649 us 87.173 us | 84.602 us 84.786 us 85.009 us | 84.375 us 84.505 us 84.649 us | 84.195 us 84.374 us 84.600 us | 84.204 us 84.365 us 84.550 us | 83.856 us 84.032 us 84.259 us | 83.751 us 83.971 us 84.246 us |
143.52 us 143.55 us 143.57 us |
Word | Explanation |
---|---|
Turning Degree | The minimum degree k that the parallel starts to get benefit. |
Task Size | The task that per thread processes. |
Task Point | The point that expresses concrete task size as the addition is 1 and the multiplication is 5~6. |
Base Turning Point | The turning degree when we perform 1 task point and degree is 13. |
Chunk Size | The task point that per thread processes. |
2
and 8
.5 times addition < 1 time multiplication < 6 times addition
.Task Point
> Degree
> Thread Number
.The turning degree can be computed approximately (Base Turning Degree) - (log(Task Point))
.
The chunk size can be computed approximately n / 2^{[8 * (log(Task Point)) + 2 *(k - (Base Turning Degree)) + log(thread_number)] / 8}
.
I think the thread number is something to do with turning degree and would like to test with higher core machine.
Resource | Github Actions | Mac |
---|---|---|
Thread Number | 2 | 8 |
CPU | 2 core | 4 core |
Memory | 7 GB | 8 GB |
Storage | 14 GB | 512GB |
Hi there. I did experiment about parallelize optimization and speeded up 20%.
Bench https://github.com/NoCtrlZ/halo2/pull/4#issuecomment-1134207070 Experiment Result https://github.com/zcash/halo2/issues/548#issuecomment-1120201797
The improvement list and status. The bench was done on Github Actions.
Target | Branch | Status | Speed Up |
---|---|---|---|
Fft | https://github.com/NoCtrlZ/halo2/pull/5 | Done | fft and ifft 20% |
pippenger | https://github.com/NoCtrlZ/halo2/pull/6 | Draft | multiexp 20% |
Parallel | https://github.com/NoCtrlZ/halo2/pull/4 | Done | prover 20% |
h() | None | None | |
wnaf | None | None |
https://github.com/ConsenSys/gnark-crypto/pull/249 is a new implementation of batch affine MSM that claims a 10-20% speedup for gnark-crypto's MSM (which was already a good implementation).
I think we can optimize curve scalar by naf with almost zero cost.
We convert field element to 256 length of bit array. https://github.com/zcash/pasta_curves/blob/main/src/curves.rs#L483 If we convert it to bit array with sign, we can reduce $n/6$ times elliptic curve addition where n is field bits length. And we also clean the zero bit and reduce the doubling.
This is the to_nafs
implementation.
https://github.com/zero-network/zero/pull/60/files#diff-15198a125f029cb977758faf8db9469e6e710d1616b5ceb08eaced67993c715dR48
And scalar arithmetic implementation. https://github.com/zero-network/zero/pull/60/files#diff-bb1d24a6ddcc81d90c6392212f713b15cac6a371be79c044302f0ea11695b02dR89
The extra cost is that to_naf
operation and curve negation when bit has sign.
to_naf
operation performs on $n$ length bit and not heavy and curve negation is also not heavy comparing with curve addition and doubling.
I think Karatsuba can be used for field arithmetic. We are using convolution product for limbs arithmetic. The following is example of 4 length limbs.
$x^6: a_3b_3$ $x^5: a_2b_3 + a_3b_2$ $x^4: a_1b_3 + a_2b_2 + a_3b_1$ $x^3: a_0b_3 + a_1b_2 + a_2b_1 + a_3b_0$ $x^2: a_0b_2 + a_1b_1 + a_2b_0$ $x^1: a_0b_1 + a_1b_0$ $x^0: a_0b_0$
$u_0v_1 + u_1v_0 = u_1v_1 + u_0v_0 - (u_1 - u_0)(v_1 - v_0)$
$x^6: a_3b_3$ $x^5: a_2b_2 + a_3b_3 - (a_2 - a_3)(b_2 - b_3)$ $x^4: a_1b_1 + a_2b_2 + a_3b_3 - (a_1 - a_3)(b_1 - b_3)$ $x^3: a_0b_0 + a_1b_1 + a_2b_2 + a_3b_3 - (a_0 - a_3)(b_0 - b_3) - (a_1 - a_2)(b_1 - b_2)$ $x^2: a_0b_0 + a_1b_1 + a_2b_2 - (a_0 - a_2)(b_0 - b_2)$ $x^1: a_0b_0 + a_1b_1 - (a_0 - a_1)(b_0 - b_1)$ $x^0: a_0b_0$
In case of 6M < 18S - 3A
, we can speed up.
but $(a_2 - a_3)(b_2 - b_3)$ will be overflow.
$(a_2 - a_3)(b_2 - b_3)$ type should be i128 so when $(a_2 - a_3)$ and $(b_2 - b_3)$ are u64::MAX, i128 positive range is u127 so it's a problem.
We would improve wNAF, if we implement quadruple for curve and 4 multiplication for field. In 4 multiplication, we can replace it with 2 right bit shift and skip one montgomery reduction. There are ideas for performance. Please let me know if these is interesting idea. Thank you!
precomputing inv * mod may also reduce 4 times mul in montgomery reduction
one of curious is one time modular reduction. bitcoin implementation doesn't perform modular reduction and manages number of arithmetic with magnitude. https://github.com/RustCrypto/elliptic-curves/blob/master/k256/src/arithmetic/field/field_impl.rs#L20
poly::Evaluator
VerifyingKey
into the transcript when batch-verifying proofs (debug formatting takes up a non-trivial amount of time). #607VerifyingKey
, and use it in the permutation argument. #607parallelize
within the verifier to make it entirely single-threaded, and check the resulting single-proof times. #608BatchVerifier
object (not the strategy) that pushes single-proof verification with theBatchVerifier
strategy onto the global threadpool, then accumulates their resultingMSM
s. #610