feat(nifs): impl `protogalaxy::poly::compute_F`

cyphersnake commented 6 months ago

Motivation Close #259

Overview This method implements the parallel computation of F(X) from protogalaxy, without storing any data.

Total iterations of the computation count_of_gates * count_of_rows, at each iteration we do log_n computations to find pow_i, and also log_n computations to multiply the corresponding parts of the computed witness and pow_i

Also the design of pow_i has been changed and simplified, now it is not possible to pass an invariant incorrectly to the input and thus the error is simplified

[x] docs
[x] manual calculated test
Note the design of the solution, the paralellizations, and the way the data is aggregated at the end.
Validate all size variables and padding methods.

I will add documentation\manual test soon, but you can already check out the code

cyphersnake commented 6 months ago

Overall the new version works, tomorrow I will fix the tests and add a circuit-based test

cyphersnake commented 6 months ago

Since I saw you implemented the algorithm describe in Claim 4.4 (haven't looked at them in detail yet, will continue tomorrow), it might be better to separate it as a function, so that compute_G can also re-use it, something like this: #259 (comment). @cyphersnake

I'm already in the process of implementing compute_G with the logic here to isolate the common part. It's quite hard to do offhand, but there's a high chance to overuse quite a few parts and keep laziness where possible (this implementation of the algorithm stores the minimum number of points at one time).

cyphersnake commented 6 months ago

I'm already in the process of implementing compute_G with the logic here to isolate the common part. It's quite hard to do offhand, but there's a high chance to overuse quite a few parts and keep laziness where possible (this implementation of the algorithm stores the minimum number of points at one time).

The difference here is that in F(X) we have X participating in pow_i, while in the case of G(X) it participates in f_i

I.e. in one case we have several trees with their \beta on edges. And in the other case we have several trees with their f_i on the nodes.

And since I'm counting all these multiple trees in one pass, it's not trivial to merge them together

chaosma commented 5 months ago

I'm already in the process of implementing compute_G with the logic here to isolate the common part. It's quite hard to do offhand, but there's a high chance to overuse quite a few parts and keep laziness where possible (this implementation of the algorithm stores the minimum number of points at one time).

The difference here is that in F(X) we have X participating in pow_i, while in the case of G(X) it participates in f_i

I.e. in one case we have several trees with their \beta on edges. And in the other case we have several trees with their f_i on the nodes.

And since I'm counting all these multiple trees in one pass, it's not trivial to merge them together

Didn't know we can use tree_reduce! I thought we have to implement a custom version of tree_reduce for this specific purpose.

Since you are using for_each inside tree_reduce,

         tree_reduce(|...| {
                 //skip
                    itertools::multizip((challenges.iter(), left.iter_mut(), right.iter()))
                        .for_each(|(challenge_powers, left, right)| {
                            *left += *right * challenge_powers[l_height.get()]
                        });
                // skip
        });

Can we do something like:

    challenge_powers.for_each(|challenge_power| {
          tree_reduce(|...| {
          })
   })

It seems to me this is also fine. This way, the compute_f and compute_g can re-use the tree_reduce code as a common function. I will try this approach in compute_g.

cyphersnake commented 5 months ago

Can we do something like:

This way we don't have to save the gates computation at any stage, and tree_reduce happens exactly on this the longest collection. So we can't just do log_n*tree_reduce

So there will also be a complex node structure with many computations per one call, it's more of a complexity to explain the difference between the cases in code

snarkify / sirius

feat(nifs): impl `protogalaxy::poly::compute_F` #280