supranational / blst

Multilingual BLS12-381 signature library
Apache License 2.0
467 stars 177 forks source link

add serde support #151

Closed kwantam closed 1 year ago

kwantam commented 1 year ago

This PR adds serde support for PublicKey, SecretKey, and Signature values for both min_pk and min_sig. It also adds a simple roundtrip test.

This support is gated by the serde feature flag as requested in discussion of #148. With the serde flag, public keys and signatures are serialized in uncompressed form.

~I also added a serde_compressed flag that works the same way, but it serializes compressed points instead. Since the deserialize() methods on PublicKey and Signature can accept both compressed and uncompressed points, there's no interoperability issues with having both compressed and uncompressed serialization. (As you can see, the deserialize function is the same in both cases!)~

If it would make sense to add some documentation for these features, please let me know. I'm not quite sure where it should be added, so a pointer would be appreciated!


closes #148

dot-asm commented 1 year ago

Just in case, if you don't have time to fix it up, I can do it as I merge...

dot-asm commented 1 year ago

Again, as for the secret key serialization. As already mentioned, I'm reluctant to provide the interfaces beyond the bare minimum in general, and more so if it leaves redundant secret key copies behind. For example, as ser.serialize_bytes(&self.serialize()) would. Wouldn't it be appropriate to zeroize the intermediate buffer? Yeah, I know, I know... But the core takes every precaution, so why would this part get a pass? [Actually, I was expecting it would be more elaborate. One can even argue that application could just as well throw few lines int...]

nazar-pc commented 1 year ago

I also added a serde_compressed flag that works the same way, but it serializes compressed points instead

This is a very bad idea. Cargo unifies features, so features can get enabled if some other crate needs them. As the result you may get different serialization results depending on what dependencies you have. That is basically a disaster waiting to happen.

BTW you can create a crate that will wrap data structure from this crate with serialization support.

kwantam commented 1 year ago

This is a very bad idea. Cargo unifies features, so features can get enabled if some other crate needs them. As the result you may get different serialization results depending on what dependencies you have. That is basically a disaster waiting to happen.

Thank you for pointing this out. It's a great point that I didn't consider. I will remove the feature.

This leaves the question: are compressed or uncompressed points preferred for serialization? I think it makes sense to go with uncompressed to avoid paying the modular square root on deserialization, but it's context specific so hard to say.

By the way: while I agree that getting different serialization results based on dependencies is spooky, I'm not sure I agree that it's a disaster: the deserialization routine can consume the result of either of the serialization routines. So it would be weird, but it wouldn't break anything as far as I can tell. (Still: spooky is bad enough, and I'm not arguing against removing the feature.)

BTW you can create a crate that will wrap data structure from this crate with serialization support.

I'm aware of this. But it's code that lots of people will presumably use, which means it's a good candidate for inclusion in the library. Asking lots of different people to write the same code is begging for bugs, incompatibility, etc. And having a separate crate (blst_serde) isn't as discoverable as a feature and, from a dependency standpoint, is morally equivalent. So I do not see any argument for a separate crate.

kwantam commented 1 year ago

Again, as for the secret key serialization. As already mentioned, I'm reluctant to provide the interfaces beyond the bare minimum in general, and more so if it leaves redundant secret key copies behind. For example, as ser.serialize_bytes(&self.serialize()) would. Wouldn't it be appropriate to zeroize the intermediate buffer? Yeah, I know, I know... But the core takes every precaution, so why would this part get a pass? [Actually, I was expecting it would be more elaborate. One can even argue that application could just as well throw few lines int...]

Apologies that I didn't address the zeroizing issue after you specifically mentioned it in #148---completely a slip of the mind. Thank you for catching it. I addressed in my latest commit via the very handy Zeroizing wrapper.

Agreed that in principle people can implement this themselves. But on the other hand: beyond interoperability, making sure that the serialization routines get the details right seems like a worthy goal. (For example, I knew I was supposed to zeroize and simply forgot!)

dot-asm commented 1 year ago

This leaves the question: are compressed or uncompressed points preferred for serialization? I think it makes sense to go with uncompressed to avoid paying the modular square root on deserialization, but it's context specific so hard to say.

Customarily it's argued that compressed is preferred because it halves the bulk storage requirement. But it doesn't have to prevent anybody from using uncompressed in live communications. Besides, when we're talking about serde, we're probably not talking about bulk storage. I mean if one actually cares about space, JSON and alikes won't be an actual choice, right?

nazar-pc commented 1 year ago

I mean if one actually cares about space, JSON and alikes won't be an actual choice, right?

To be fair serde is not just about JSON. It supports various efficient formats like bincode and even zero-copy deserialization.

kwantam commented 1 year ago

As one data point, it looks like Eth2 impls use compressed keys everywhere. So apparently no one is worrying so much about the cost of the modular square roots, which I'm sure are fast anyway because y'all are wizards.

@dot-asm do you have a number in mind for the modular square root cost? It's got to be tens of microseconds at worst given the speed of signing---no?

dot-asm commented 1 year ago

@dot-asm do you have a number in mind for the modular square root cost? It's got to be tens of microseconds at worst given the speed of signing---no?

Does "less than 15" qualify as "tens"? Well, this is for the base field. Double that for the extension field, which should qualify:-) Either way, it's ~1/6 of the sign [in the corresponding field], but it's more meaningful to compare it to the group-check. It's ~1/3 for the base field and ~1/2 for the extension one.

Again, the Eth2's choice is said to be driven by the [bulk] storage requirements, but I for one argue that the rationale is twisted. Ultimately serialization-deserialization is about communications and it would arguably be appropriate to trade bandwidth for that expensive calculations. In other words it makes more sense to compress only as you store the data for a long term than to use it everywhere...

dot-asm commented 1 year ago

The secret-key discussion actually asks for the following question. As already mentioned, serialization-deserialization is really about communications. While a secret key is not something one would communicate, there is a non-secret communication-worthy counterpart, blst_scalar. Well, SecretKey is blst_scalar in disguise, but one can say that it's effectively a marker to distinguish scalars that are used specifically as secret keys from the "communication-worthy" ones. In other words, a suggestion would be to add serde interface to blst_scalar. Thoughts? [It might be sensible to add a type alias for the blst_scalar...]

kwantam commented 1 year ago

Does that solve the problem? Basically we're telling users just to "declassify" their secret key by turning it into a blst_scalar, but an adversary doesn't care what type I call the bytes ;)

I could be missing your point, of course! Please correct me if I've missed something important here.

dot-asm commented 1 year ago

Does that solve the problem?

Suggestion to serde blst_scalar is independent of SecretKey. Point is that there a legitimate need to communicate with blst_scalar-s, e.g. to challenge a commitment. The original suggestion was to serde-ify at least EC points, public keys and signatures, but formally speaking we should raise the "at least" bar to cover even blst_scalar.

Basically we're telling users just to "declassify" their secret key by turning it into a blst_scalar,

No, suggestion isn't to facilitate "declassification." There is no direct path from SecretKey to blst_scalar, and adding serde interface to blst_scalar doesn't change that. Naturally it won't stop dedicated developer exercising usafe powers or cross-deserializing the two, but there is nothing we can do about that in either case :-)

dot-asm commented 1 year ago

I exercised the editorial power to squash all commits :-) Thanks!