Make Field generic - Githubissues

kevaundray commented 3 years ago

Currently Noir is only compiles for bn254's scalar field, however there is nothing stopping it from being generic over any field.

This issue starts a discussion on the work needed to be done to make the compiler generic, and considerations.

Field Trait

We would need a NoirField Trait instead of a FieldElement struct. The trait methods will initially be the methods that are on FieldElement.

Noir will need to be aware of the possible Fields, however we can delegate this list to ACIR, similar to how we delegate all possible OPCODEs there.

Security Level

The security level of a group would need to be documented in Noir also, so that user can be warned when they are using a group which has < 128 bit security.

Some considerations about the implementation:

Question: Should Noir have implementations of all fields supported?

Compiling with implementations for all supported fields, is the current idea and here are the advantages and disadvantages.

Advantages

dyn will not need to be used.
There will be a single field element implementation per field that backends can choose to hook into.

Disadvantages

program will now contain all possible Field element implementations, increasing the binary size.
If a backend wants to use a new field, they will need to first upstream the implementation and implement the NoirField trait for it (This may not be a disadvantage)
Backends will need to convert from a NoirField trait to the type their proving system uses. (unless they implement it over a NoirField Trait) This may mean a lot of deserialising and subsequent performance loss.

Since NoirField trait can be implemented for the arkworks Field trait, we can immediately get all Field implementations that arkwork uses. For newer fields, I believe that the surface area for the NoirField trait is small and generic enough that wrapping a new field is negligible.

Other ideas would be for backends to supply their own field element implementation to plug into Noir, however this encourages code duplication between backends and may lead to non-deterministic constraint systems in the same field, if the field element implementation differs.

Compiler aware Field

The compiler currently does not need to be aware of the chosen field implementation, since there is only one. It is however aware of the different backends and we would have the same mechanism where the chosen field is passed as a parameter value in the package manager config file, and the package manager picks the corresponding data structure which is then passed to the compiler.

Rough Action steps

Replace the FieldElement struct with the Field trait, but still only compile for bn254.
Add all supported Fields noir_field as an enum. We then can add a new method in the Proving system trait supported_fields which allows the backends to specify which of these Fields they support. At this moment in time, there will only be one supported Field, which is bn254.
Add bls12-381 and allow the user to state they want to use bls12-381. The barretenberg backend should return an error, as it only supports bn254. We modify the arkworks backend adding bls12_381 to its supported_fields , adding the bls12_381 field as a viable field and testing whether it compiles.

kevaundray commented 3 years ago

First PR will be for a generic field trait, this will be the most intrusive refactor

kevaundray commented 3 years ago

Field trait abstraction does not seem to work well as generics need to be monomorphised at compile time, so we need to create all of the variants at compile time and store them. We then run into the problem of each Program being a different type as Programs are generic over they field. We could possibly make nargo generic over the field, however as seen from here: https://github.com/noir-lang/noir/tree/field_trait_fail ; the api is clunky and the changes made somewhat hurt readability.

It seems that we will need to use compilation flags and a toolchain installer in the long run.This might take some time, so below I list a few short term solutions.

A short term alternative is to choose the field size at runtime, however this seems to require a library akin to BigUint/BigInt.

Another short term alternative is to also have the Field be an enum and have the possible variants be each field. This would not blow up the size of FieldElement, since an enum is roughly the size of it's largest component plus a discriminant, and most fields are about 256 bits. These short term solutions would ideally allow one to make the field generic, with the least amount of change.

huitseeker commented 3 years ago

I've pushed the branch a bit more in #60. A couple comments beyond what I've left in commit messages:

if Program goes through successive elaborations, the actual distinction between fields does not seem to matter in the first few phases, especially in the Parser/ Token / Lexer levels. Int and Integer could contain a single concrete type (let's say NumericField, which payload would be .. bytes), which would make vast swaths of the compiler monomorphic. It would only be in later stages that a ParsedProgram (containing a deeply nested NumericField data) would become a ParsedAndFieldInterpretedProgram<F>.
the code size vs. performance trade-off is somewhat controllable through lto. Do you have context on why code size matters here?
the problem with having the Field be an enum is what happens when two such value meet. You then need to pattern-match to check they are the same type, in order to have them interact. The dynamic behavior gives you just an (admittedly much faster) dyn FieldElement.

kevaundray commented 3 years ago

if Program goes through successive elaborations, the actual distinction between fields does not seem to matter in the first few phases, especially in the Parser/ Token / Lexer levels. Int and Integer could contain a single concrete type (let's say NumericField, which payload would be .. bytes), which would make vast swaths of the compiler monomorphic. It would only be in later stages that a ParsedProgram (containing a deeply nested NumericField data) would become a ParsedAndFieldInterpretedProgram.

Yeah, having the Lexer and Parser contain the underlying field is not ideal as you would need to monomorphise them for each available field, when they do not explicitly use field operations.

I'd also further say that having the underlying field for the analysis phase is also not ideal since we do not need it for resolution and for the type checker the underlying field is always opaque. I wonder if instead of a bytes payload, a BigInteger could be used which may mean that the Interpreter could also use NumericField.

With bytes, we would not be able to do this, since we need to do field operations in the Interpreter. We could possibly avoid any field reductions until we get to the part of the code where we need an underlying field and reduce there. This could possibly be the backend. ACVM and ACIR could possibly still be generic, and the compiler would instantiate it as ACIR maybe.

the code size vs. performance trade-off is somewhat controllable through lto. Do you have context on why code size matters here?

Ideally the compiler would be runnable on the browser, although I am not entirely sure if the code size will be a problem in the end. My assumption was that 20MB for the compiler right now, along with a possible SRS might be a lot. This could be mute after the backends are conditional compiled and optimisations are applied.

the problem with having the Field be an enum is what happens when two such value meet. You then need to pattern-match to check they are the same type, in order to have them interact. The dynamic behavior gives you just an (admittedly much faster) dyn FieldElement.

Yep, this was why I stopped on the field_trait_fail branch. Semantically it did not make sense, because you can only ever have one field on a given run of the compiler.

Another slightly convoluted method would be to specify an ACVM backend with it's field and proving system at compile time. For libraries like arkworks this may become convoluted but for libraries that only have a single concrete field implementation, it seems reasonable. This implies that the compiler would be built with all available fields.

Will give this more thought as I also believe using conditional compilation is a viable strategy via a mechanism like Rustup; however the BigInt strategy may allow for generic fields in the short term without modifying the code drastically, so that we can figure out a better long term strategy.

kevaundray commented 3 years ago

PR #71 conditionally compiles each backend and field. The only problem with this is that rust features are meant to be additive, so it is currently not easy to say "I want to enable only one of these features". Multiple features can be enabled at the same time, from dependencies.

The most straightforward way to stop this would be to add cfg_any for each pair of features, which is not desired

kevaundray commented 3 years ago

closed by #71

noir-lang / noir

Make Field generic #49