Open adria0 opened 1 month ago
@adria0 First of all, thx for all research and proposal!
Here is my idea about your proposal:
Why we need this? If we want to quick process data between trusted parties we have UncompresseEncoding::from_uncompressed_unchecked (that is in the order of magnitude of SerdeObject decodings), if not we can use GroupEncoding::from_bytes that fits into standards.
I think this SerdeObject
is for field stuff, while UncompressedEncoding
is for curve one.
https://github.com/duguorong009/halo2curves/blob/b753a832e92d5c86c5c997327a9cf9de86a18851/derive/src/field/mod.rs#L485
Maybe, they do not bring any meaningful difference in underlying implementation.
But, I think it's odd to implement UncompressedEncoding
trait for any field.
Follow Arkworks standard I think @davidnevadoc did some research on serialization in Arkworks, in previous works.
Implement serde for Field, is not implemented
I think SerdeObject
is for this purpose.
They help to convert field to bytes and vice versa.
Rename EndianRepr::[from, to]_bytes
Do we need this renaming?
As you said, these are for internal use.
Overall, I think we first need documentation. The documentation, I mean, is both of one explaining all current traits(role, use, ...), and one of planning how to improve.
Thanks @adria0 for the in depth description of the issue!
I think we should tackle fields and curve serialization separately:
For curves:
GroupEncoding
and UncompressedEncoding
are traits meant for curves (groups), not for fields.For fields, let's stick to prime fields only for now ( field extensions can be represented in multiple ways and that is a bigger problem that exceeds the scope of this issue).
I agree with @adria0 point on SerdeObject
. I think it is one of the main points of redundancy and it may need to be removed.
In my view, its name makes it easy to confuse with the usual serde::Serialize
and serde::Deserialize
traits.
Then, the doc that explains its functionality states:
/// Trait for converting raw bytes to/from the internal representation of a type.
/// For example, field elements are represented in Montgomery form and serialized/deserialized without Montgomery reduction.
I can't see why this kind of access needs to be exposed publicly. Specially when Montgomery/non-Montgomery form introduces one of the main sources of confusion when it comes to field representation.
I also agree with point made about from_raw
. Unfortunately this is not a clear name, raw
could mean anything.
Is it raw bytes? raw limbs/ u64s? Montgomery or non-Montgomery?.
I think just the opposite about from_uniform_bytes
. I think it is crystal-clear that the input for this function is a series of random bytes and therefore, it has to be transformed into a field element w/o validity checks.
from_mont
and to_mont
look to me like good names. (perhaps replacing SerdeObject
?)
Lastly, I agree 100% with @guorong009 remark about documentation. Whichever design we land on, we should make an effort to have it well documented to prevent more confusion.
Round two,
Field Encodings
from_bytes
/to_bytes
: They use an industry-standard format that is consistent with how curves are encoded. This format is what will be used internally by the Serde library to ensure interoperability. Provides a unified format for both field and curve serialization. Ensures a consistent, industry-standard serialization, using big or little endian depending on the curvefrom_mont
/to_mont
: These methods convert elements to and from the Montgomery form, which is an internal representation that is commonly used for efficient field arithmetic. Use these when working specifically with Montgomery-reduced values, especially in cryptographic computations.from_raw
: Creates a field element from a raw integer (typically limbs or u64s). Use this method when directly converting an integer value into a field element.from_uniform_bytes
: Converts a uniform random byte array into a valid field element. This is particularly useful in scenarios requiring a random element in the field, such as in cryptographic protocols or when hashing to the field.Curve Encodings
GroupEncoding
trait methods: Implements the serialization and deserialization of curve points in a compressed format. Compression is slower but generates standardized encodings that are smaller in size. Suitable for storing and transmitting curve points efficiently. Serde will use this.UncompressedEncoding
trait methods: Provides faster serialization/deserialization of curve points in an uncompressed format. The output is larger than that of GroupEncoding, but it's quicker to generate. When speed is prioritized over size.Notes:
from_bytes
, to_bytes
from EndianRepr
trait is only intended for internal use only. Do not use.I tried to new version of the serialization where SerdeObject
is removed in halo2, and found out that there are more serialization traits!
SerdePrimeField
for example, implemented in Halo2...
Concretely, we should decide what to do with the parts of serialization implemented in halo2 (see https://github.com/privacy-scaling-explorations/halo2/blob/main/halo2_backend/src/helpers.rs)
Should we bring this kind of functionality to halo2curves ?
I tried to new version of the serialization where
SerdeObject
is removed in halo2, and found out that there are more serialization traits!SerdePrimeField
for example, implemented in Halo2... Concretely, we should decide what to do with the parts of serialization implemented in halo2 (see https://github.com/privacy-scaling-explorations/halo2/blob/main/halo2_backend/src/helpers.rs) Should we bring this kind of functionality to halo2curves ?
I think that implementing this helper in halo2curves, just adds more confusion, a library only has to provide one way to do the same thing.
I also question, why does this helper exists? It's just a simple wrapper. Exists maybe because halo2curves serialization was not really documented and was a way for implementer to clarify what can be done in halo2curves? I do not see this usefull in Halo2 in terms of cryptographic protocol.
@adria0 @davidnevadoc
I've done some research why these helpers SerdeFormat
(halo2), SerdeObject
(halo2curves) are there.
The SerdeObject
trait was introduced in this PR. https://github.com/privacy-scaling-explorations/halo2curves/commit/5bcc8914a4d1d75901b62eeac3c4347f826f09d0
The SerdeFormat
was introduced in this PR. https://github.com/privacy-scaling-explorations/halo2/pull/111
All of work was done by @jonathanpwang .
Before that, there was no concept of Serde*
in the halo2
and halo2curves
.
For example, the zcash/halo2
has only one option - CurveRead
(compressed). https://github.com/zcash/halo2/blob/main/halo2_proofs/src/helpers.rs
I believe we should check why @jonathanpwang added these. If those traits & helpers do not have specific purpose we don't know, we can remove them, as @adria0 mentioned.
The reason they were added is that previously the default in halo2
and halo2curves
was that serde always serialized and deserialized to the human-readable canonical representation of a BigInt as some u64
digits. However the internal struct stores the BigInt primes in Montgomery form. Therefore the serialization and deserialization was doing Montgomery reduction each time, which is not desirable if the serialization is being used for raw storage purposes that don't need human readability. For halo2 this was particularly relevant for proving keys.
So Serde*
was created as a way to offer these separate "raw" (de)serde methods without interfering with the existing human readable serde methods.
I haven't been up to date on the current situation, so I can't tell if they are needed anymore, but this was the original motivation.
Edit: SerdeObject
can be safely removed if you are able to define from_mont
and to_mont
somewhere else. The Serde*
traits in halo2 was purely a way to extend traits beyond ff::Field
. SerdeFormat
is an enum to toggle between the different serialization methods.
from @davidnevadoc
Related issue: #109
First round, initial idea
Halo2curves serializations comes with two flavours:
Standard serialization
Field.{from,to}_repr
fngroup::GroupEncoding (for compressed)
, andgroup::UncompresseEncoding
(for uncompressed), allowing "unchecked" reads.endian
infield_impl!
macroEndianRepr
for endianess, implCompressed
for flags and to uncompress.Vainilla "raw" serialization
{from, to}_raw_bytes_[unchecked]
{read,write}_raw_[unchecked]
Others
Notes:
EndianRepr
but it seems like some kind of type-hack to allow to knowing the endiannes of CurveAffine::BASE when encoding compressing curves. Since our halo2curves takes traits from pasta, we are not able to modify it.Unserialize performance:
legend:
code: https://gist.github.com/adria0/c440185d765a368aaf21ca5741a63ab7
Initial proposal:
EndianRepr::from_bytes
,EndianRepr::to_bytes
is implemented in Field (that aready have these fields). This is extra confusing, these are just intended to be used internally as helpers. So rename this fields for something like endianrepr_{from,to}_bytesComments: @davidnevadoc @duguorong009 @kilic ?