powdr-labs / powdr

A modular stack for zkVMs, with a focus on productivity, security and performance.
Apache License 2.0
381 stars 78 forks source link

Witness generation for multiplicities in LogUp / bus argument #1573

Open georgwiese opened 1 month ago

georgwiese commented 1 month ago

We currently do not generate the multiplicities column when using our LogUp-based PIL lookup implementation. In fact, the test just uses a fixed column.

We should:

onurinanc commented 1 month ago
georgwiese commented 1 month ago

Right, I think the constraints should just how they are, but the multiplicity should turn into a fixed column. Having it as a fixed column in the test was only a hack because we don't have witness generation yet.

do you mean for example by the column name like "multiplicity" Yes, exactly, as long as we don't have #1378.

I think, for now, we would have both, the built-in lookup and the "library" version, which adds a stage-0 multiplicity column and a few stage-1 column. If the lookup spawns multiple namespaces (as they typically do, because they connect two machines), the CompositeBackend will remove the native lookup anyway, so it basically already acts as an annotation in witgen.

A little bit background on "machine detection": For witgen, we currently group columns into "machines" by looking at the connected components after removing lookups and permutation (so if two columns appear together in a polynomial identity, they are in the same machine).

Actually, now that I think about it, I think a good first step would be to add a lookup-via-bus, analogous to the permutation-via-bus that you added before, because:

And then we have to detect the multiplicity and increment the counters. I think it makes sense to start with the FixedLookup machine (I think this might be the only one where we really use lookups in practice).

onurinanc commented 1 month ago

Right, I think the constraints should just how they are, but the multiplicity should turn into a fixed column. Having it as a fixed column in the test was only a hack because we don't have witness generation yet.

I think you mean that "multiplicity" should turn into a witness column, not a fixed column right?

For the machine detection part, which part of the code do you look at the connected components, and what do you mean by the connected components? Is there any example for it? And, which part of the code you remove lookups and permutation?

If we don't have the constraints dependent on LHS and RHS, how do we make the machine detection simpler in this case?

Should we detect the multiplicity in the generated .pil file somehow?

georgwiese commented 1 month ago

I think you mean that "multiplicity" should turn into a witness column, not a fixed column right?

Yes!

For the machine detection part, which part of the code do you look at the connected components, and what do you mean by the connected components? Is there any example for it?

In witgen, there is this machine extractor: It takes as input all the identities and groups them into machines. By connected components I just mean that two witnesses belong to the same machine if they appear together in a polynomial identity, or together on the LHS or RHS of a lookup / permutation.

If we don't have the constraints dependent on LHS and RHS, how do we make the machine detection simpler in this case?

Take test_data/std/permutation_via_challenges.asm as an example: The LHS is [a1, a2], the RHS is [b1, b2], so if this was a "native" lookup (first_four $ [a1, a2] in (1 - first_four) [b1, b2]), the above algorithm would detect two machines: One with witness [a1, a2] and one with [b1, b2], because they also don't depend together in any polynomial identity. This is the behavior we want.

But std::protocol::permutation generates this constraint:

((1 - main.first_four) * (std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * main.b1 + main.b2) - 1) + 1) * main.z' = (main.first_four * (std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * main.a1 + main.a2) - 1) + 1) * main.z;

So I believe the machine extractor as it works now would put everything in one machine.

In contrast, test_data/std/bus_permutation_via_challenges.asm generates these two constraints:

(std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * (std::prelude::challenge(0, 1) + main.a1') + main.a2')) * (main.z' - main.z * (1 - main.is_first')) = main.first_four';
(std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * (std::prelude::challenge(0, 1) + main.b1') + main.b2')) * (main.u' - main.u * (1 - main.is_first')) = 21888242871839275222246405745257275088548364400416034343698204186575808495616 * (1 - main.first_four');

The first one is only concerned with the LHS, the second one only with the RHS, so the machine extraction algorithm still works!

And, which part of the code you remove lookups and permutation?

That's unrelated to witgen I guess, but any time we prove using the "composite" backend (e.g. --prove-with halo2-mock-composite on the CLI), it creates a proof for each namespace (which corresponds to machines in ASM) and ignores any lookups and permutations between them. This is the code that does this.

Of course, in the end, we should not have a native lookup / permutation in the first place. Instead, there should be just an annotation (#1378), so it's explicit that this only tells witgen how machines are connected and does not enforce any constraint. Then a function like std::protocols::permutation_via_bus::permutation can generate both the constraint and the annotation.

onurinanc commented 1 month ago

I'm not sure how/where to detect multiplicities, and where to calculate after the detection.

You mentioned that it makes to start with the FixedLookup machine. So, what the FixedLookup::new function should return in this case?

What I understand from this issue:

And, one confusing point for me is in which phase/stage I should calculate how many times the looked-up values occur in the lookup table. Looks like it should be a better idea to calculate the multiplicities in stage-0 and do the witness generation in stage-1 maybe?

georgwiese commented 1 month ago

Yeah, it's not super clear how that best fits into the current codebase, let's have a call tomorrow :)

And, one confusing point for me is in which phase/stage I should calculate how many times the looked-up values occur in the lookup table. Looks like it should be a better idea to calculate the multiplicities in stage-0 and do the witness generation in stage-1 maybe?

I think the multiplicities can (and probably should, for security?) go into a stage-0 column, as their value does not depend on any challenge.

onurinanc commented 1 month ago
georgwiese commented 1 month ago

I think it should still stay in FixedLookup. The lookup is still for fixed columns only, it's just the multiplicity witness column that is part of the lookup argument.

I haven't thought this through 100%, but I think I would go in small steps like: