Rsa.PKCS1.sig_* are incorrectly named

cfcs commented 7 years ago

The functions for signing do not implement the SIGN function from the PKCS 1 spec.

The current interface looks like this:

 (*.. cut ..*)
  val Nocrypto.Rsa.PKCS1.sig_encode ?mask:mask -> key:priv -> Cstruct.t -> Cstruct.t
  val Nocrypto.Rsa.PKCS1.encrypt ?g:Rng.g -> key:pub -> Cstruct.t -> Cstruct.t

The implementations of which are (src/rsa.ml, the PKCS1 module):

  (*.. cut ..*)
  let sig_encode ?mask ~key msg =
    padded pad_01 (decrypt ?mask ~key) (priv_bits key) msg
  let encrypt ?g ~key msg =
    padded (pad_02 ?g) (encrypt ~key) (pub_bits key) msg

So basically they're equivalent, except:

encrypt-functions pad with 00 02, and pad with random bytes. This implements the RSAES-PKCS1-V1_5-ENCRYPT primitive described in RFC 3447
- Uses EME-PKCS1-v1_5 (see the -ENCRYPT link above) to encode the message.
sig_-functions prepend 0x00 0x01, pad with 0xFF. Judging by the name I expected this to be an implementation of RSASSA-PKCS1-V1_5-SIGN, the deterministic signing primitive described in RFC 3447, but it does not fully implement EMSA-PKCS1-v1.5 which requires you to "encode" the message by replacing it with ASN.1 DER encoding of the structure described in RFC 3447 appendix A.2.4 (which basically means Hash(oid_of_hashing_algorithm || digest_of_msg) )

I suspect the reason for this confusion is that in TLS 1.0 and 1.1 they use a custom scheme in which they replace the ASN.1 DER-encoded block (the T from RFC 3447) with MD5(msg) || SHA1(msg) (as can be seen here https://github.com/mirleft/ocaml-tls/blob/41668134e197b4e8cd75a56505004f8570a88b07/lib/handshake_common.ml#L144-L145 )

Excerpt:

In RSA signing, a 36-byte structure of two hashes (one SHA and one MD5) is signed (encrypted with the private key). It is encoded with PKCS # 1 block type 1, as described in [PKCS1A].

I believe this was changed in TLS 1.2 to use the one from the standard.

EDIT: @hannesm pointed me to how TLS 1.2 handles this in ocaml-tls, and indeed it does use RSASSA-PKCS1-V1_5-SIGN from the PKCS1 specification:

I'm writing a small tool to verify OpenPGP signatures, and this is one of the few things that the OpenPGP format does not have a custom way of doing, so I would like to see this updated. See the relevant section

I suggest renaming sig_encode and sig_decode to reflect that they're used in the older TLS protocols, and to add two new ones that take a Nocrypto.Hash.S.t from which it can derive the correct OID and call get t on to get the digest. I would like to avoid a Ctypes.t with the data since it would be nice to be able to validate large blobs that may not fit in memory. Providing the hash could work, but is not as nice as taking it directly from a Hash.S.t state object since that exposes the programmer to issues related to confusing the hash type (although I guess we can live with that).

I am not sure what the easiest way to derive the module from a t is (if there's a trick to do this nicely with the type system). The only idea I can think of would be to make the t a transparent tuple of real_t * (module Hash.S), but that's not super pretty, so I'm hoping someone who has better knowledge of the type system can help out. Alternatively I guess we can make it val sign : key:Rsa.priv -> m : (module Hash.S) -> hash : Hash.S.t, but again I'm not sure how to make sure that hash is an m.t.

cfcs commented 7 years ago

The PR above doesn't contain an easy way to obtain a Nocrypto.RSA.PKCS1.S.t from a Nocrypto.Hash.S.t value, nor the other way. Both would be useful for implementing protocols that utilize the hash for other things. I'll add a to_hash to the PKCS1.S interface for now.

pqwy commented 7 years ago

Yes, PKCS1.5 signature is missing the important bit, which is ASN.1 packing/unpacking. I agree that this is unsatisfactory.

One half of the reason is historical abuse of PKCS1.5, as in TLS1.0, which meant that some API clients needed the operation exposed without ASN.1. The other half is that I didn't want to depend on asn1-combinators.

However, after two readings, I still don't understand what exactly are you proposing. Renaming the functions? Support for rolling hashes? Clearer documentation?

cfcs commented 7 years ago

I'm suggesting adding two functions, namely the ones so elegantly christened in the spec as respectively RSASSA-PKCS1-V1_5-SIGN and RSASSA-PKCS1-V1_5-VERIFY. The rationale behind the request is that those two are used in a number of common protocols, notably PGP (which I worked a bit on) and SSH (which haesbaert is doing now).
I would appreciate rolling hashes. Or at least being able to sign something and extract the final digest that was signed.
I don't understand the naming convention used for sig_encode, but I suspect that function will mostly be found useful by people implementing TLS 1.0/1.1 from scratch, so TBH I don't mind what happens to it.

pqwy commented 7 years ago

And how do you propose to deal with ASN.1 here?
See here and there. All hashes roll now.
Me neither. I think it meant "encode the signature", with "encode" meaning "deal with the padding and the RSA transformation around it, but not with the nasty blob inside." It could have been "transmogrify the signature" equally well. Or "frob."

NB -- I strongly approve of the choice of names. Sounds :metal::metal:.

cfcs commented 7 years ago

@pqwy

handling ASN: I would just hardcode the stubs for the algorithms shipped with Nocrypto and provide a way for people to provide them for their own hash algorithms. I don't think it's worth doing ASN.1 conversions every single time you want to verify/sign something, especially if that component is missing in the stack (or you don't want a dependency on a full ASN.1 DER encoder to support this corner case protocol).

re: names: :)

pqwy commented 7 years ago

Right, the idea being to ignore the comprehensive support for the gnarly binary protocol that is siting in a repo next door, and just hard-code a couple of large, structured binary constants.

One last bit I'm struggling to figure out -- what is preventing you from doing all that outside the library?

cfcs commented 7 years ago

The existence of a Nocrypto.Rsa.PKCS1 module, along with the original advertisement (The RSA module provides the basics: raw encryption and decryption, PKCS1-padded versions of the same operations, and PKCS1 signing and signature verification.) gave me the impression that it was within the scope of project to implement PKCS1 signature creation/verification. I needed that, and it wasn't there, so I thought this might be the right place to bring it up.

That being said I don't think anything prevents me from implementing it outside this library, so if you would prefer that I made a separate library I'm happy to do that.

pqwy commented 7 years ago

I'm on the fence with this.

At some point I planned to do a separate lib which would combine nocrypto with various ASN grammars that are floating around from various standards, and intended the full PKCS1.5 to be a part of that. But seeing how little of ASN is actually needed -- and in particular that trick to avoid parsing it -- makes this... ... tantalizing.

How about:

I give you this.

You:

nest a single Make functor at the end of Rsa.PKCS1;
drop the incremental hashing;
drop the pre-made instances;
drop S along with them;
drop even the size check as it can be deferred to sig_encode; and
use Hash.S.code to detect which ASN blob to choose?

IOW make an unfancy cut of the changes:

Make (H : Hash.S) : sig
  val sign   : ?mask:mask -> key:priv -> Cstruct.t -> Cstruct.t
  val verify : key:pub -> signature:Cstruct.t -> Cstruct.t -> bool
end

pqwy commented 7 years ago

It boils down to

  module Make (H : Hash.S) = struct

    let asn = lazy (Cstruct.of_string @@ match H.code with
        `MD5    -> "\x30\x20\x30\x0c\x06\x08\x2a\x86\x48\x86\xf7\x0d\x02\x05\x05\x00\x04\x10"
      | `SHA1   -> "\x30\x21\x30\x09\x06\x05\x2b\x0e\x03\x02\x1a\x05\x00\x04\x14"
      | `SHA224 -> "\x30\x2d\x30\x0d\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x04\x05\x00\x04\x1c"
      | `SHA256 -> "\x30\x41\x30\x0d\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x02\x05\x00\x04\x30"
      | `SHA384 -> "\x30\x31\x30\x0d\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x01\x05\x00\x04\x20"
      | `SHA512 -> "\x30\x51\x30\x0d\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x03\x05\x00\x04\x40"
    )

    let digest cs = Cs.(Lazy.force asn <+> H.digest cs)

    let sign ?mask ~key msg = sig_encode ?mask ~key (digest msg)

    let verify ~key ~signature msg =
      sig_decode ~key signature
      |> Option.v_map ~def:false ~f:(Cs.ct_eq (digest msg))
  end

... which is a good line shorter than I expected.

Modulo incremental hashing, which is a separate issue, does that suit you?

The essential trick here is replacing parsing with re-generating the ASN blob, which can be done by prepending a fixed stub -- an idea that I exclusively owe to your pull request.

cfcs commented 7 years ago

That interface looks good!

The incremental hashing is important (so I can avoid having to load 4 gigabytes in memory to verify a Qubes iso, for example), so I'm also looking forward to that. :-)

pqwy commented 7 years ago

If the use case for super-sized stuff is locally-stored blobs, you can use mmap in the meantime.

cfcs commented 7 years ago

Yeah, but that won't work on a 20 MB Mirage unikernel etc, so that's a bit of a hack. Would still prefer a streaming api. :)

pqwy commented 7 years ago

There is no "local" for Mirage. Outside of mirage, it's not a hack -- I mostly mmap big files.

There's this background chatter about things too big for memory. You are not seriously sending 4GB of data over the network to a unikernel just to sign it, are you?

cfcs commented 7 years ago

No, I'm sending it over Xen vchan. And also to verify it.

cfcs commented 7 years ago

So, just to understand your comment above correctly: Did you request any action on my part? Do you want me to update my pull request or to put this code in a separate library?

pqwy commented 7 years ago

It's cool, I'm about to merge a version of that snippet below.

pqwy commented 7 years ago

So a bit like this?

cfcs commented 7 years ago

Perfect, thank you!

pqwy commented 7 years ago

Do you still believe that sig_encode and sig_decode were misnomers?

cfcs commented 7 years ago

Well, I found the names confusing. I still don't think the names make a lot of sense, but sign and verify take precedence when you're looking for sense in the interface, and I can't think of a more concise alternative name for them, so I would just keep them there as-is. They're useful for doing the TLS stuff, or for manually passing in the ASN.1 stubs for other hash algorithms.

pqwy commented 7 years ago

Yeah, they are useful. This might be the reason why they are a part of the interface.

If you figure out a word that corresponds to that part of the PKCS 1.5 signature transformation that does not depend on the hash function -- rsa ∘ padding -- better than "encode", please file a formal notice and send it to my postal address. Make sure to get it signed by the required office too.

cfcs commented 7 years ago

Right, as opposed to sign and verify, who could possibly want that. I mean you can implement all of this using Nocrypto.Rsa, so I don't really see why any of this was in here in the first place. Come to think of it you could just implement it with zarith, this Rsa abstraction is just a bloated convenience feature. What's your point? :)

hannesm commented 7 years ago

@pqwy NB I think the new signature of verify - passing the hash only optionally - is a bad idea.

if you take X.509 into consideration, if a developer only passes in the signature value and the to-be-signed certificate (but not the algorithmIdentifier used for the hash), an attacker may have replaced the signature value with another signature algorithm (MD5!?), and the (sloppy) developer won't notice that. I'd be in favour of always requiring verify to get the hash passed. In the scenario above there's clearly a sloppy programmer (who doesn't pass in hash) needed, but if it's optional, people will not pass it.

EDIT: it also looks that making hash required will simplify the code around verify (by getting rid of detect etc.).

pqwy commented 6 years ago

Compromise.

hannesm commented 6 years ago

@pqwy I'd be really interested what use case is in your mind. Is it purely academic, or do you have a concrete protocol or specification where the hash algorithm is not communicated out of band?

The protocols I'm aware of:

X.509 This field MUST contain the same algorithm identifier as the signatureAlgorithm field
OpenPGP where in both V3 and V4 signature packets a one-byte hash algorithm is present
SSH explicitly specifies RSASSA-PKCS1-v1_5 scheme in RFC3447 using the SHA-1 hash

I'm keen to learn if there's a protocol where the freedom of your "compromise" is necessary. My point is that your proposed hashp is rather cumbersome to deal with in practise. Keep it simple.

pqwy commented 6 years ago

Fully encoded PKCS1.5 intrinsically supports algorithm self-description. I don't want to artificially hide this characteristic, irrespective of the common use where this information is repeated somewhere in the envelope.

The compromise in question is not selecting the hash with a predicate, it's making that parameter mandatory.

If you want to force a hash, for every context in which you'd have

verify ~hash ~key ~signature cs

instead you have

let p = (=)
...
verify ~hashp:(p hash) ~key ~signature cs

unless of course you just wrap it

let verify ~hash = verify ~hashp:(p hash)

I'm increasingly curious -- what is the problem here?

hannesm commented 6 years ago

As described earlier, I cannot find any use of PKCS 1.5 where the hash algorithm is not communicated out of band. And these two are better be the same - otherwise, an attacker can remove the signature, and replace it with a weak (broken) hash algorithm (and I'm pretty sure I've seen CVEs with exactly this problem - where the out of band communicated hash algorithm and the one used in the signature didn't match - unfortunately I cannot find them right now). Now, stepping back to earlier RNG APIs you provided, and people misused them, which unintentionally led to weak random IIRC.

As you show in your code snippets, a user now has to construct a unary function and pass it to verify - I still don't see much benefit. To repeat myself, I'd be interested where your motivation comes from, backed by defined and used protocols.

pqwy commented 6 years ago

It is backed by the properties of the signature scheme itself.

Notationally it is not heavier than passing a single hash.

Imagining what a sloppy-developer-type character will do does not sound terribly productive. The one I imagine prefers ~hashp:((=) hash) to ~hashp:(fun -> true) because they want to use hash that they have just parsed, and because it's 7 chars shorter. It's a guessing game.

I suggest we wait and see if this really becomes a pitfall. In the meantime, I strongly prefer generality.

hannesm commented 6 years ago

To repeat myself: could you please point me to an example of a protocol where the hash is not communicated out of band? Thanks.

pqwy commented 6 years ago

It is backed by the properties of the signature scheme itself.

mirleft / ocaml-nocrypto

Rsa.PKCS1.sig_* are incorrectly named #119