openssl / openssl

TLS/SSL and crypto library
https://www.openssl.org
Apache License 2.0
25.37k stars 10.04k forks source link

CMS SignerInfo signatureAlgorithm, what gives? #11413

Open levitte opened 4 years ago

levitte commented 4 years ago

I'm a little lost on what the signatureAlgorithm is supposed to be, and it seems to vary depending on key algorithm, at least looking at our implementation. Looking at the code we produce a Subject Public Key Algorithm for RSA (https://tools.ietf.org/html/rfc3279#section-2.3.1), while we produce a Signature Algorithm for DSA (https://tools.ietf.org/html/rfc3279#section-2.2.2) and ECDSA (https://tools.ietf.org/html/rfc3279#section-2.2.3).

Evidence for RSA:

https://github.com/openssl/openssl/blob/8158cf209792f7a92f0812ac89a9f54950e8453b/crypto/rsa/rsa_ameth.c#L812-L815 https://github.com/openssl/openssl/blob/8158cf209792f7a92f0812ac89a9f54950e8453b/crypto/rsa/rsa_ameth.c#L822

Evidence for DSA (snid is the composed digest-sign NID):

https://github.com/openssl/openssl/blob/8158cf209792f7a92f0812ac89a9f54950e8453b/crypto/dsa/dsa_ameth.c#L492-L497

Evidence for ECDSA:

https://github.com/openssl/openssl/blob/8158cf209792f7a92f0812ac89a9f54950e8453b/crypto/ec/ec_ameth.c#L493-L498

So I wonder, why the difference? And most of all, I wonder why the digest-sign algid should be used at all, considering that the SignerInfo structure has a separate digestAlgorithm and the Message Signature Verification Process isn't interest in a digest-sign algorithm at all

Is this possibly a bug? Otherwise, what am I missing? Am I reading the code wrong? If this is indeed an correct inconsistency, is this documented in a standard somewhere?

levitte commented 4 years ago

@DDvO, you might be the guru du jour, can you shed some light on this?

levitte commented 4 years ago

I also looked at the GOST implementation, and it seems to produce Subject Public Key Algorithm (https://tools.ietf.org/html/rfc4491#section-2.3), just like we do for RSA. Evidence:

https://github.com/gost-engine/engine/blob/7c30097805cba0c62555493df6dad9f0c5d1d0f3/gost_ameth.c#L258-L261

levitte commented 4 years ago

It's also a bit interesting to find diverse comments referring to implementations using signature OIDs as "broken":

https://github.com/openssl/openssl/blob/8158cf209792f7a92f0812ac89a9f54950e8453b/crypto/cms/cms_lib.c#L337-L340

I'm not sure if this is related, though...

mattcaswell commented 4 years ago

Looking at the code we produce a Subject Public Key Algorithm for RSA

No. In all cases we are generating an AlgorithmIdentifier (X509_ALGOR). What is confusing is the OID being used in that AlgorithmIdentifier. Sometimes we use the OID for the underlying signature algorithm and somethime the combined OID of the signature algorithm and the digest algorithm.

A SignerInfo is defined here: https://tools.ietf.org/html/rfc5652#section-5.3

  SignerInfo ::= SEQUENCE {
    version CMSVersion,
    sid SignerIdentifier,
    digestAlgorithm DigestAlgorithmIdentifier,
    signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
    signatureAlgorithm SignatureAlgorithmIdentifier,
    signature SignatureValue,
    unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }

So we have separate fields for both the digest algorithm and the signature algorithm. Note that DigestAlgorithmIdentifier and SignatureAlgorithmIdentifier above are just:

  DigestAlgorithmIdentifier ::= AlgorithmIdentifier

and

  SignatureAlgorithmIdentifier ::= AlgorithmIdentifier

The digesit algorithm identifier is described as:

The DigestAlgorithmIdentifier type identifies a message-digest algorithm. Examples include SHA-1, MD2, and MD5. A message-digest algorithm maps an octet string (the content) to another octet string (the message digest).

But the signature algorithm identifier is described as:

The SignatureAlgorithmIdentifier type identifies a signature algorithm, and it can also identify a message digest algorithm. Examples include RSA, DSA, DSA with SHA-1, ECDSA, and ECDSA with SHA-256. A signature algorithm supports signature generation and verification operations. The signature generation operation uses the message digest and the signer's private key to generate a signature value. The signature verification operation uses the message digest and the signer's public key to determine whether or not a signature value is valid. Context determines which operation is intended.

So this explicitly allows either form of OID, i.e. either just the signature OID or the composed OID of signature and digest. I cannot see where it says what to do if the value for the DigestAlgorithmIdentifier and the SignatureAlgorithmIdentifier disagree.

mattcaswell commented 4 years ago

It's also a bit interesting to find diverse comments referring to implementations using signature OIDs as "broken":

I suspect this is referring to a case where the digestAlgorithm has a combined signature OID, e.g. so rather than the SHA1 OID, we have DSA with SHA1. This does seem broken.

levitte commented 4 years ago

No. In all cases we are generating an AlgorithmIdentifier

[ahem] Subject Public Key Algorithms in RFC 3279 are also given as an AlgorithmIdentifier, so it's not possible to say "no" that categorically.

That being said, the examples you pointed at (which is from RFC 5652, while I was reading RFC 2630) tell me that we can choose either and will still be ok. The DSA evidence I displayed could just as well have been done like this, in other words:

    case ASN1_PKEY_CTRL_CMS_SIGN:
        if (arg1 == 0) {
            ASN1_STRING *str = NULL;
            X509_ALGOR *alg;

            CMS_SignerInfo_get0_algs(arg2, NULL, NULL, NULL, &alg);
            str = ASN1_STRING_new();
            str->length = i2d_DSAparams(dsa, &str->data);
            X509_ALGOR_set0(alg, EVP_PKEY_id(pkey), V_ASN1_SEQUENCE, str);
        }
        return 1;

If the general consensus is that we can choose the form that we find most practical (DSA-with-sha1 is more practical than DSA because the former specifies that the parameters should be undefined, ie. V_ASN1_UNDEF), then I have no further issue, my question has been answered, albeit not quite like I had anticipated.

dcooper16 commented 4 years ago

A better place to look than the generic text in Section 10.1.2 of RFC 5652 is at the standards that specify the encodings for particular algorithms. RFC 3279 is not relevant, as that only applies to X.509 certificates and CRLs. Standards related to use of algorithms with CMS may be found at https://datatracker.ietf.org/wg/smime/documents.

In RFC 2630, the relevant text was in Section 12.2, but that has now been replaced by Section 3 of RFC 3370. Both specify that the signature algorithm should be id-dsa-with-sha1 for DSA and rsaEncryption for RSA (RFC 5754 specifies the SHA-2 algorithms). For RSA, RFC 3370 says:

CMS implementations that include the RSA (PKCS #1 v1.5) signature algorithm MUST support the rsaEncryption signature value algorithm identifier, and CMS implementations MAY support RSA (PKCS #1 v1.5) signature value algorithm identifiers that specify both the RSA (PKCS #1 v1.5) signature algorithm and the message digest algorithm.

There is no mention for DSA of using id-dsa in the signatureAlgorithms field.

Some other relevant RFCs are RFC 4056 (RSASSA-PSS), RFC 4490 (GOST) and RFC 5753 (elliptic curve).

levitte commented 4 years ago

Okie, so the message I'm getting is that "it depends"...

russhousley commented 4 years ago

The DigestAlgorithmIdentifier is separate to accommodate stream processing. This lets the signature validation code know what hash algorithm to apply to the content since the signature algorithm identifier comes after the signedAttrs, which are hashed.

The SignatureAlgorithmIdentifier has been defined by convention to be a combination of the signature algorithm and that hash algorithm, such as: sha256WithRSAEncryption and id-Ed25519 (because it is only ever used with SHA-512) and id-Ed448 (because it it only ever used with shake256).

levitte commented 4 years ago

The SignatureAlgorithmIdentifier has been defined by convention to be a combination of the signature algorithm and that hash algorithm, such as: sha256WithRSAEncryption and id-Ed25519 (because it is only ever used with SHA-512) and id-Ed448 (because it it only ever used with shake256).

And yet, with RSA it's rsaEncryption (no "with-anything") and GOST is also coded that way...

russhousley commented 4 years ago

In a certificate, rsaEncryption is used for the subject public key.

Since RFC 3370 (August 2002), sha1WithRSAEncryption and md5WithRSAEncryption are used for the CMS SignatureAlgorithmIdentifier, and the RFC 4055 defines the identifiers for SHA-224, SHA-256, SHA-384, and SHA-512 with RSA.

dcooper16 commented 4 years ago

Since RFC 3370 (August 2002), sha1WithRSAEncryption and md5WithRSAEncryption are used for the CMS SignatureAlgorithmIdentifier, and the RFC 4055 defines the identifiers for SHA-224, SHA-256, SHA-384, and SHA-512 with RSA.

Hi Russ,

I believe this is incorrect. Section 3.2 of RFC 3370 says:

The rsaEncryption algorithm identifier is used to identify RSA (PKCS #1 v1.5) signature values regardless of the message digest algorithm employed. CMS implementations that include the RSA (PKCS #1 v1.5) signature algorithm MUST support the rsaEncryption signature value algorithm identifier, and CMS implementations MAY support RSA (PKCS #1 v1.5) signature value algorithm identifiers that specify both the RSA (PKCS #1 v1.5) signature algorithm and the message digest algorithm.

For RSA (PKCS #1 v1.5) with SHA-2, it seems that RFC 5754, Using SHA2 Algorithms with Cryptographic Message Syntax, is a better reference than RFC 4055 (which is about certificates and CRLs). Section 3.2 of RFC 5754 says:

[RFC3370], Section 3.2, specifies the conventions for RSA with SHA-1 (RSASSA-PKCS1-v1_5) public key algorithm identifiers, parameters, public keys, and signature values. RSA with SHA2 algorithms uses the same conventions for these public key algorithm identifiers, parameters, public keys, and signature values. RSA (RSASSA-PKCS1-v1_5) [RFC3447] MAY be used with SHA-224, SHA-256, SHA-384, or SHA-512. The object identifiers are taken from [RFC4055].

I interpret the "uses the same conventions" part to mean that the rule that "The rsaEncryption algorithm identifier is used to identify RSA (PKCS #1 v1.5) signature values regardless of the message digest algorithm employed" still applies when using a SHA-2 hash algorithm.

For pretty much every other signature algorithm, the standards specify to use an identifier that specifies both the signature algorithm and the hash algorithm.

russhousley commented 4 years ago

Yes, RFC 3370 says that rsaEncryption with no hash algorithm MUST be supported. The sha1WithRSAEncryption and md5WithRSAEncryption were introduced as well. The rsaEncryption was the only one in RFC 2360, which is obsoleted by RFC 3369 and RFC 3370. RFC 4055 defines the algorithm identifiers for RSA with SHA-224, SHA-256, SHA-384, and SHA512.

Also, PKCS #1 v1.5 encodes the algorithm identifier of the hash algorithm inside the signature. When decrypted with the public key, the DigestInfo contains the algorithm identifier of the hash algorithm that was used.

levitte commented 4 years ago

For pretty much every other signature algorithm, the standards specify to use an identifier that specifies both the signature algorithm and the hash algorithm.

Not with GOST. I so wish it was only with RSA this happens... but from the looks of it, this is not deterministically standardised, so there seems indeed to be a need for key type specific solutions to cover current and possibly future variations. This does explain all the CMS specific functions in EVP_PKEY_ASN1_METHOD, something I just hadn't a grip on earlier.

The main reason I asked was to figure out the best way to handle this with the new provider mechanism without having to insert CMS specific functions in there as well. I believe I have figured out a way that's not too ugly. PR coming up tomorrow.

beldmit commented 4 years ago

For pretty much every other signature algorithm, the standards specify to use an identifier that specifies both the signature algorithm and the hash algorithm.

Not with GOST.

We have only one hash algorithm allowed to use with a particular signature algorithm. So that's why the signature algorithm is enough in such a situation.

levitte commented 4 years ago

Yeah. The problem I was facing was trying to get something programmatically consistent without having to resort to CMS specific backend hackery (like we have in EVP_PKEY_ASN1_METHOD). The answer is that it's not possible.

DDvO commented 4 years ago

@DDvO, you might be the guru du jour, can you shed some light on this?

Sorry that so far I had missed this question by @levitte.

I'm not an expert on CMS but there was a related discussion on potentially inconsistent CMS digest/signature algorithm specifications in #9392 where I participated. There the focus was not which form of SignatureAlgorithmIdentifier (with or without digest alg) to use for signing but which information to use when verifying a CMS signature.

My conclusion there was that in case the DigestAlgorithmIdentifier and the SignatureAlgorithmIdentifier disagree on the digest algorithm the DigestAlgorithmIdentifier takes precedence and that the OpenSSL implementation was in line with that. In case the SignatureAlgorithmIdentifier does not refer to a digest algorithm it should be taken from the DigestAlgorithmIdentifier.

It looks like that a CMS verification implementation could also throw an error if the DigestAlgorithmIdentifier and the SignatureAlgorithmIdentifier are inconsistent, at least like @mattcaswell I did not find a requirement in the relevant RFCs how to handle that situation.

nhorman commented 2 months ago

Marking as inactive, to be closed at the end of 3.4 dev, barring further input