openssl / openssl

TLS/SSL and crypto library
https://www.openssl.org
Apache License 2.0
25.4k stars 10.05k forks source link

OpenSSL emits invalid ASN.1 when an explicit text contains non-ASCII text, but no explicit tag #20772

Open alex opened 1 year ago

alex commented 1 year ago

Given the following configuration file:

[req]
distinguished_name = dn
req_extensions = v3_req

[dn]
CN = YourCommonName

[v3_req]
subjectAltName = DNS:yourdomain.com
certificatePolicies = @polsection

[polsection]
policyIdentifier = 1.3.6.1.4.1.6449.1.2.2.25
CPS.1 = "http://example.com/cps"
userNotice.1 = @notice

[notice]
explicitText = "This is a policy qualifier with explicit text. And an ◌̈"

The following invocation openssl x509 -req -in req.csr -signkey key.pem -out cert.pem -days 365 -extfile req.conf -extensions v3_req (any CSR and key will do), produces the following certificate (viewed with der2ascii):

SEQUENCE {
  SEQUENCE {
    [0] {
      INTEGER { 2 }
    }
    INTEGER { `00b70da24f5382fae9` }
    SEQUENCE {
      # sha256WithRSAEncryption
      OBJECT_IDENTIFIER { 1.2.840.113549.1.1.11 }
      NULL {}
    }
    SEQUENCE {
      SET {
        SEQUENCE {
          # countryName
          OBJECT_IDENTIFIER { 2.5.4.6 }
          PrintableString { "US" }
        }
      }
      SET {
        SEQUENCE {
          # stateOrProvinceName
          OBJECT_IDENTIFIER { 2.5.4.8 }
          UTF8String { "DC" }
        }
      }
      SET {
        SEQUENCE {
          # localityName
          OBJECT_IDENTIFIER { 2.5.4.7 }
          UTF8String { "Washington" }
        }
      }
      SET {
        SEQUENCE {
          # organizationName
          OBJECT_IDENTIFIER { 2.5.4.10 }
          UTF8String { "Alex's House of Certs" }
        }
      }
      SET {
        SEQUENCE {
          # organizationUnitName
          OBJECT_IDENTIFIER { 2.5.4.11 }
          UTF8String { "Division of Trouble Making" }
        }
      }
    }
    SEQUENCE {
      UTCTime { "230419133141Z" }
      UTCTime { "240418133141Z" }
    }
    SEQUENCE {
      SET {
        SEQUENCE {
          # countryName
          OBJECT_IDENTIFIER { 2.5.4.6 }
          PrintableString { "US" }
        }
      }
      SET {
        SEQUENCE {
          # stateOrProvinceName
          OBJECT_IDENTIFIER { 2.5.4.8 }
          UTF8String { "DC" }
        }
      }
      SET {
        SEQUENCE {
          # localityName
          OBJECT_IDENTIFIER { 2.5.4.7 }
          UTF8String { "Washington" }
        }
      }
      SET {
        SEQUENCE {
          # organizationName
          OBJECT_IDENTIFIER { 2.5.4.10 }
          UTF8String { "Alex's House of Certs" }
        }
      }
      SET {
        SEQUENCE {
          # organizationUnitName
          OBJECT_IDENTIFIER { 2.5.4.11 }
          UTF8String { "Division of Trouble Making" }
        }
      }
    }
    SEQUENCE {
      SEQUENCE {
        # rsaEncryption
        OBJECT_IDENTIFIER { 1.2.840.113549.1.1.1 }
        NULL {}
      }
      BIT_STRING {
        `00`
        SEQUENCE {
          INTEGER { `009e1318c858f7795efca2c97d8dc7cf8b72dbb2670e2a4e34b5d03f2f39b17051d755086717981a7f5881f7f9afe455d4e4ed2b9d90395c965b0495f923121d9341f2c976823f34521ffdf4f0ede464516aa341159d687f2633e71977c4b06c8618d768f7cece22c491552bb8ab84cf3afa6326a63723a9cd3db63fa988c5f02594522681437c3f51d85bb335402be25bc4909a687a204469e36b945e7384b2c932589ca0fe12f94852566c1ae4afafeaafa8ff28a3093134f574e8eb4c20c1a665abb81c4821e85c07795f6dfb8349a413a2d6bce8417c8094d55f733993ddb55d41fe9be445deeeb949b9af024c6fa0f792322a6af0ecfde14de91d90acc9a7` }
          INTEGER { 65537 }
        }
      }
    }
    [3] {
      SEQUENCE {
        SEQUENCE {
          # subjectAltName
          OBJECT_IDENTIFIER { 2.5.29.17 }
          OCTET_STRING {
            SEQUENCE {
              [2 PRIMITIVE] { "yourdomain.com" }
            }
          }
        }
        SEQUENCE {
          # certificatePolicies
          OBJECT_IDENTIFIER { 2.5.29.32 }
          OCTET_STRING {
            SEQUENCE {
              SEQUENCE {
                OBJECT_IDENTIFIER { 1.3.6.1.4.1.6449.1.2.2.25 }
                SEQUENCE {
                  SEQUENCE {
                    OBJECT_IDENTIFIER { 1.3.6.1.5.5.7.2.1 }
                    IA5String { "http://example.com/cps" }
                  }
                  SEQUENCE {
                    # unotice
                    OBJECT_IDENTIFIER { 1.3.6.1.5.5.7.2.2 }
                    SEQUENCE {
                      VisibleString { "This is a policy qualifier with explicit text. And an \xe2\x97\x8c\xcc\x88" }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  SEQUENCE {
    # sha256WithRSAEncryption
    OBJECT_IDENTIFIER { 1.2.840.113549.1.1.11 }
    NULL {}
  }
  BIT_STRING { `00` `7d494a3de24ab88143edeb6174b12e3a00ce3f02ad619be794ea05db772df4b4be4284f54c78afbbf0a29661a2852ba8857ebfc58442f7c211be0c1648abec158437953cf3769355ab38044a9d73f4c5f2e5c139880825efc0d652375daa3d22573128d3176693a11479d68f5ac54806edadf3b573651163a8083902d86d3e0428bc781aabd36344176cf8daa1b3ce8bec26c4ff603655d6b18f54a9c691895dd77ffc540cdb3e9725bee8cdfb5f05d1d9e12d3da510d2a28d4ed311e1adcc413ae94099837ae7073976074d6f8e4c9ddbead266cf84e35c510f06ca8c14d16d7202e7e29e3f26cbf5c726f4041fd56d78eb0e1becc60c85520743c01323960b` }
}

As you can see the VisibleString contains non-ASCII characters, however the DER specification limits the set of characters that are valid for this element, which are a subset of ASCII. Therefore this is invalid DER.

Note that RFC 5280 says:

An explicitText field includes the textual statement directly in the certificate. The explicitText field is a string with a maximum size of 200 characters. Conforming CAs SHOULD use the UTF8String encoding for explicitText, but MAY use IA5String. Conforming CAs MUST NOT encode explicitText as VisibleString or BMPString. The explicitText string SHOULD NOT include any control characters (e.g., U+0000 to U+001F and U+007F to U+009F). When the UTF8String encoding is used, all character sequences SHOULD be normalized according to Unicode normalization form C (NFC) [NFC].

Though this is contravened by RFC 6818:

An explicitText field includes the textual statement directly in the certificate. The explicitText field is a string with a maximum size of 200 characters. Conforming CAs SHOULD use the UTF8String encoding for explicitText. VisibleString or BMPString are acceptable but less preferred alternatives. Conforming CAs MUST NOT encode explicitText as IA5String. The explicitText string SHOULD NOT include any control characters (e.g., U+0000 to U+001F and U+007F to U+009F). When the UTF8String or BMPString encoding is used, all character sequences SHOULD be normalized according to Unicode normalization form C (NFC) [NFC].

Based on a review of the OpenSSL source, I believe this is caused by https://github.com/openssl/openssl/blob/master/crypto/x509/v3_cpols.c#L263 where OpenSSL defaults to visible string, in the absence of a more specific tag request from the user.

Based on these paragraphs, I believe the default should be UTF8String, or at a minimum UTF8String should be used when the text contains non-ASCII characters.

alex commented 1 year ago

I see you've marked this as a feature, but the issue here is that OpenSSL emits invalid DER. I don't think "emit valid DER" is accurately characterized as a feature request.

t8m commented 1 year ago

Please note that the creation of X.509 through the configuration can produce invalid X.509 certificates also for many other reasons. And it is possible to create a valid X.509 with proper configuration.

I marked it as feature because it is highly unlikely that we would consider any fix for this issue for backport in stable releases.

slontis commented 1 year ago

This is a bug, stating that other things cause bugs does not turn it into a feature.

t-j-h commented 1 year ago

I think what @t8m is indicating is that this is just like incorrect API usage - the user can place correct configuration in the file and get the result - but this interface is low level where you specify the ASN1 you want and if you don't specify you get a default type. That is what the code is meant to do. There is a way for the user to provide the input in a correct form with the type specified and get the result they want.

The interface choice may be somewhat unexpected and not really the sort of interface that we want to have in place - but it was considered the "power user" interface for arbitrary insertion into certificates. There is no logic in any of those extension interfaces that is there to enforce rules of the types of fields for encoding.

Every one of those interfaces will happily let you put string values in and insert into the ASN1 output stuff that is invalid in the context of a certificate. Do you expect all these routines to enforce strictly valid usage and reject what the user asks for if it violates some rule someone defines somewhere in an RFC that as you can see changes over time?

alex commented 1 year ago

How is a user expected to know that they're using a low level interface? What's the correct, high level interface, for using policy qualifiers?

In any event: Yes, I think an X.509 library shouldn't emit certificates that are not valid DER. My view is that this is quite foundational, in addition to causing immediate bugs for users, emitting invalid certificates poisons the ecosystem by forcing other parsers/verifiers to accept invalid inputs that have proliferated.

davidben commented 1 year ago

Strong agreement with @alex here. While it is unfortunate that OpenSSL has this problematic text interface, this is the interface you all shipped. The fact is people are using it, and one of the project's responsibilities to ensure that the library is not prone to misuse.

Just because it is possible to produce invalid outputs with this interface (which is itself questionable, but a broader problem), the default way to use the interface should be correct.

Indeed we've seen certificates with precisely this mistake in the wild. Thank you, @alex, for finding this. It's clear now the root cause was this OpenSSL flaw. So the issue here is not hypothetical. OpenSSL's mistake here has done real, empirical, and lasting harm to the ecosystem. Please fix it so we can stem the bleeding, and at least ensure certificates issued going forward are correct.

t-j-h commented 1 year ago

RFC 3280 which first defined explicitText usage in this context explicitly allowed for all the types to be used - it was noted as DisplayText which can be ia5String, visibleString, bmpString, and utf8String - recognising at the time that they are all valid usage.

When subsequent RFCs define additional requirements and restrictions (and will continue to do so) then it is up to the user of these interfaces to match those updated requirements - the interfaces provide you with the ability to do that. The interfaces are not designed or documented to do this for you.

We don't have an interface that says it will follow whatever the latest RFC happens to have in it.

It is a reasonable argument that a different sort of interface should exist which provides a way to follow whatever the latest RFC at the time of the given OpenSSL release happens to specify (but it will not be able to enforce restrictions that get defined after the specified release naturally enough).

The original code for explicitText handling was from 1999 and only encoded as VisibleString. A contribution in 2016 allowed the user to specify what type they wanted. See PR https://github.com/openssl/openssl/pull/576.

It implemented in a backward compatible manner that leaves things as VisibleString unless you ask for a different type.

The documentation for explicitText in the config file notes the syntax for specifying the string encoding. See https://www.openssl.org/docs/manmaster/man5/x509v3_config.html

alex commented 1 year ago

There's two semi-related things here:

1) Which string type to use, and the various preferences of RFCs 3280, 5280, and 6818. I completely agree that at the time of 3280 there was nothing disfavoring VisibleString on this basis.

2) The requirements from DER (ITU 690) which limit which characters are valid for VisibleString. Those have made these non-ASCII characters invalid for forever; this string was never valid DER on that basis.

davidben commented 1 year ago

While the documentation does say a prefix can change the encoding, it doesn't say that you get VisibleString by default and that, to support non-ASCII characters, you need to explicitly specify a UTF8 prefix.

In other contexts where it takes a string-based input (X.509 names in particular), OpenSSL uses its "mbstring" machinery, which picks a string type based on the contents, which would resolve (2). In fact, the mbstring machinery switched its default to preferring UTF8String in 3009244da47b989c4cc59ba02cf81a4e9d8f8431, so OpenSSL does have an interface that follows the latest standards (1), as it should! That seems an appropriate function to use, given both the problems the current behavior has caused, and the OpenSSL documentation which doesn't specifically VisibleString.

(Probably that should have been the solution for https://github.com/openssl/openssl/pull/576 in the first place, given how other parts of the config API work, but I assume that was an oversight. Lets correct that oversight now.)

t8m commented 1 year ago

I am not saying, we should not change this behavior. What I am just saying is this should not be done on stable release branches.

hlandau commented 1 year ago

We shouldn't be emitting invalid DER. That doesn't mean we should be expected to validate the semantics of any particular extension, which OpenSSL doesn't necessarily have knowledge of. These are separate issues. A distinction is to be made between emitting DER which is literally malformed, and emitting well-formed DER which some standard may consider to be in violation of its schema.

The obvious options here are to silently upgrade to a different string type if a character required in the string requires it, as mentioned above, or refuse the input.

davidben commented 1 year ago

Ah, I think I see the confusion here. This bug is not about schema-agnostic code. The code in question is perfectly aware of the schema and semantics of the structure: https://github.com/openssl/openssl/blob/master/crypto/x509/v3_cpols.c#L317

(It has to, otherwise how can it possibly interpret explicitText?)

There is a schema-agnostic path (ASN1_generate_v3) which, of course, cannot help but leave things to the user. But that's not what's going on here. This is code specific to this extension, which has no reason to emit something invalid.

t8m commented 1 year ago

OTC: This is a bug. A PR should be raised to automatically upgrade to a suitable string type when a character is requested in a string which cannot be represented as a VisibleString and an encoding is not specified. We should not emit invalid DER. We will decide whether such PR is to be backported to stable branches after that PR is opened.