zmap / zlint

X.509 Certificate Linter focused on Web PKI standards and requirements.
https://zmap.io
Apache License 2.0
361 stars 110 forks source link

Revisiting `e_key_usage_and_extended_key_usage_inconsistent` lint and RFC interpretation. #553

Open cpu opened 3 years ago

cpu commented 3 years ago

The e_key_usage_and_extended_key_usage_inconsistent lint introduced in https://github.com/zmap/zlint/pull/497 (and refined in https://github.com/zmap/zlint/pull/528) had a lot of discussion about the right interpretation of RFC 5280. I'm sorry that I missed getting a chance to review that PR and weigh in at the time, but I wanted to re-open the discussion with one concrete case in mind. I also think we should have a larger discussion about the conclusion to use an Error finding + the update to our integration test data.

EKU DigitalSignature|KeyEncipherment

Consider a certificate that has ExtKeyUsage == ExtKeyUsageServerAuth and KeyUsage == KeyUsageDigitalSignature|KeyUsageKeyEncipherment. In this case the certificate will take the server cert arm of the Execute function's EKU loop:

        case x509.ExtKeyUsageServerAuth:
            if !serverAuth[c.KeyUsage] {
                return &lint.LintResult{Status: lint.Error}
            }

On L52 the cert's KeyUsage is used as a lookup in the serverAuth map, and if it isn't found, a lint.Error status result is returned.

The serverAuth map is defined as:

var serverAuth = map[x509.KeyUsage]bool{
    x509.KeyUsageDigitalSignature: true,
    x509.KeyUsageKeyEncipherment:  true,
    x509.KeyUsageKeyAgreement:     true,
}

The combined KeyUsageDigitalSignature|KeyUsageKeyEncipherment is not present in serverAuth, and so an error result is returned.

RFC 5280 interpretation

RFC 5280 4.2.1.12 says:

The following key usage purposes are defined:

   anyExtendedKeyUsage OBJECT IDENTIFIER ::= { id-ce-extKeyUsage 0 }

   id-kp OBJECT IDENTIFIER ::= { id-pkix 3 }

   id-kp-serverAuth             OBJECT IDENTIFIER ::= { id-kp 1 }
   -- TLS WWW server authentication
   -- Key usage bits that may be consistent: digitalSignature,
   -- keyEncipherment or keyAgreement

The crux of whether this is or isn't a false positive seems to come down to the interpretation of this text, and the "or" in particular. It's not crystal clear to me whether the keyEncipherment|keyAgreement combination of these usages is consistent with id-kp-serverAuth. My sense is that this combination should be allowed -- I don't see how this combined EKU and the KU are inconsistent at all in the case of a certificate with an RSA public key.

Error Result finding + Integration test data

Given the uncertainty about how to interpret RFC 5280's language here, and the fact that this lint introduced 140 new Error level findings across our integration test corpus, I think we should re-evaluate whether we should be using a warning or info result instead. It doesn't look like there was any analysis done on these 140 certificates at the time the lint was merged. Were they all true positives? Unambiguously?

@sleevi @zakird @christopher-henderson @ZsofiaTomicsko @RufusJWB What do you folks think? Is this more clear cut than I think? I'd love to figure out whether we need to take any action with this lint or not ahead of cutting the v3.1.0 tag if possible since that release will include this lint.

cpu commented 3 years ago

:wave: @cardonator Interested in your thoughts on this case too

cardonator commented 3 years ago

Thanks for pinging me on this @cpu. I want to review this question/use case more thoroughly with some other internal folks and I'll get back with some better thoughts on this subject.

christopher-henderson commented 3 years ago

I can say that after we merged this I I ran a certificate that immediately triggered this error. Granted, it was a certificate that was generated by an internal CA in order to secure a DB API, so it wasn't exactly representative. But it did make me wonder.

It doesn't look like there was any analysis done on these 140 certificates

I'll take this on to at least smoke check what was going on with these.

clintwilson commented 3 years ago

The examples given here are represented in many (10^8 order of magnitude, it looks like) of the certificates used in the WebPKI, so at least w.r.t. a certificate that has ExtKeyUsage == ExtKeyUsageServerAuth and KeyUsage == KeyUsageDigitalSignature|KeyUsageKeyEncipherment, we'll see a very large number of certs encountering this error. I wonder if @robstradling might also have some thoughts here.

RufusJWB commented 3 years ago

We would need to ask @ZsofiaTomicsko about this.

It doesn't look like there was any analysis done on these 140 certificates at the time the lint was merged. Were they all true positives?

ZsofiaTomicsko commented 3 years ago

My issue is interpreting the "or" in the RFC as a "xor" still seems to be the most logical, and at the moment the only possible solution, since the author is using the "and/or" expression too. For example:

id-kp-clientAuth             OBJECT IDENTIFIER ::= { id-kp 2 }
   -- TLS WWW client authentication
   -- Key usage bits that may be consistent: digitalSignature
   -- and/or keyAgreement

Because of this I feel like "or" and "and/or" must be interpreted differently, and I can't think of anything else other than "or" meaning "xor" and "and/or" meaning the traditional "or".

Technically in that special case you mentioned:

   id-kp-serverAuth             OBJECT IDENTIFIER ::= { id-kp 1 }
   -- TLS WWW server authentication
   -- Key usage bits that may be consistent: digitalSignature,
   -- keyEncipherment or keyAgreement`

the digitalSignature and the keyEncipherment aren't seperated by an "or", but a comma, so maybe it could be replaced by an "and/or" instead, but this interpretation seems a bit forced to me at the moment.

For me personally the more unclear question was whether this list was meant to be a recommendation only, or if these are the only valid combinations. Cristopher described this problem really well in #497 :

Additionally I'm a bit hesitant to mark this as an error (although I'm leaning more toward error as I type this) because of the word "used" in...

the certificate MUST only be used for a purpose consistent with both extensions.

To me, this wording implies that this is a MUST on the implementation that is accepting the certificate and not a MUST on the certificate itself. Also, the often repeated weasel word of lower case "may":

Key usage bits that may be consistent...

This wording is telling me, "by the way, here are some example pairings, but you may have more". Restated, I'm reading this as saying

The KUs and the EKUs MUST be consistent, but what "consistent" is is left as a policy of the accepting application.

...but then the following clause uses the all important word - "defined"...

The following key usage purposes are defined

...this seems to say, "...but at minimum, you must use these". So, as long as the ones defined here are bare minimum examples then I suppose this should be an error.

In my comment (https://github.com/zmap/zlint/pull/497#issuecomment-746274605 point 3) I started leaning towards marking this as a warning instead too, and I still stand by that opinion. Please tell me what you think.

cpu commented 3 years ago

I started leaning towards marking this as a warning instead too, and I still stand by that opinion. Please tell me what you think.

Thanks @ZsofiaTomicsko, I think your instinct here was good. Apologies we didn't catch this sooner :-)

It doesn't look like there was any analysis done on these 140 certificates at the time the lint was merged. Were they all true positives?

I started to pull on this thread and I'm wondering if there's also something going on with our integration tests.....

Running:

 make integration SHELL=(which bash) INT_FLAGS="-lintSummary -fingerprintSummary -lintFilter='e_key_usage_and_extended_key_usage_inconsistent'"

I get the summary:

summary of result type by lint name:
e_key_usage_and_extended_key_usage_inconsistent                     fatals: 0    errs: 140  warns: 0    infos: 0

But there are ~566,923 certificate fingerprints with "errs: 1" in the output from -fingerprintSummary. That number seems much more in line with what @clintwilson reports from spelunking CT.

I'm out of time to dig before work but will pick this up later. Either I've made a PEBKAC error or something is up...

sleevi commented 3 years ago

I don’t think treating it as xor is correct, especially when looking at the RFCs that use this (e.g. https://tools.ietf.org/html/rfc5246#section-7.4.2 ). Specifically, this is a permission grant for ciphersuites, and “any of these bits are acceptable for permission grants” is how it should be read.

To the question of warning/error however, I do stand by the Error classification, as both the intent then and now (as reflected in https://tools.ietf.org/html/rfc8813 ), was that only the key usages specified are considered consistent, and any key usages not specified are inconsistent. This was, and is, important for considering cross-protocol issues.

I think the semantic of or vs and/or here might be traceable to an editing mistake/change, rather than a semantic intent.

cpu commented 3 years ago

Specifically, this is a permission grant for ciphersuites, and “any of these bits are acceptable for permission grants” is how it should be read.

@sleevi That makes sense to me, and so ExtKeyUsage == ExtKeyUsageServerAuth and KeyUsage == KeyUsageDigitalSignature|KeyUsageKeyEncipherment seems kosher, am I reading you right?

To the question of warning/error however, I do stand by the Error classification

I would personally feel better about making a call on this if we find more clarity on the integration test diffs. If it's ~140 certs and we can spot check some to verify that their KU/EKUs are definitely inconsistent an error result seems fine. If it's >100k then I think we're either kicking off a very large incident for a bunch of CAs or have made an error. In general I'm nervous about false positives from key lint sources resulting in CAs turning off pre-issuance linting or ignoring large swaths of lints.

sleevi commented 3 years ago

@sleevi That makes sense to me, and so ExtKeyUsage == ExtKeyUsageServerAuth and KeyUsage == KeyUsageDigitalSignature|KeyUsageKeyEncipherment seems kosher, am I reading you right?

Yup. I think I misread on the original issue regarding the discussion of xor/exclusive or. It’s an or of an exclusive set, but not an exclusive or 😅

To the question of warning/error however, I do stand by the Error classification

I would personally feel better about making a call on this if we find more clarity on the integration test diffs. If it's ~140 certs and we can spot check some to verify that their KU/EKUs are definitely inconsistent an error result seems fine.

Totally agreed!

robstradling commented 3 years ago

RFC5480 specifies "the format of the subjectPublicKeyInfo field in X.509 certificates [RFC5280] that use Elliptic Curve Cryptography (ECC)" (see https://tools.ietf.org/html/rfc5480#section-1). I think it's therefore reasonable to assume that RFC5480 is intended to be consistent with (the intent of) RFC5280.

https://tools.ietf.org/html/rfc5480#section-3 says:

If the keyUsage extension is present in an End Entity (EE) certificate that indicates id-ecPublicKey in SubjectPublicKeyInfo, then any combination of the following values MAY be present: digitalSignature; nonRepudiation; and keyAgreement. This means that digitalSignature AND keyAgreement may appear together in a leaf certificate that has an id-ecPublicKey SubjectPublicKeyInfo, which clearly has an impact on how we interpret https://tools.ietf.org/html/rfc5280#section-4.2.1.12: -- Key usage bits that may be consistent: digitalSignature, -- keyEncipherment or keyAgreement

I think the intended meaning of that "or" is that any combination of digitalSignature/keyEncipherment/keyAgreement is permitted, except that keyEncipherment and keyAgreement should not appear together (because at most one of these two bits could possibly ever be useful in any given certificate).

keyAgreement enables (non-ephemeral) cipher suites that use an ECC leaf certificate. keyEncipherment enables (non-ephermal) cipher suites that use an RSA leaf certificate. digitalSignature enables ephemeral cipher suites for both ECC and RSA leaf certificates.

This is relevant: https://security.stackexchange.com/questions/24106/which-key-usages-are-required-by-each-key-exchange-method Although this part is a bit depressing: "However, some implementations will also accept keyAgreement in lieu of keyEncipherment, or nonRepudiation even if digitalSignature is not set; and some will just totally ignore the Key Usage extension (even if marked critical). For maximum interoperability, specify all four flags in the Key Usage extension.

robstradling commented 3 years ago

I just filed an erratum against RFC5280: https://www.rfc-editor.org/errata/eid6414

cpu commented 3 years ago

@robstradling Your interpretation here makes sense to me and I'm supportive of changing the lint logic to match. Thanks for filing the errata!

robstradling commented 3 years ago

Discussion of the erratum on the PKIX list: https://mailarchive.ietf.org/arch/msg/pkix/rLa6v4kOlD8BQaJBwnmwX4oxsrE/

cardonator commented 3 years ago

I think it's reasonable to assume that any combination of the values is acceptable, but it's hard to understand why the clearly carefully thought out distinction of the other descriptions in the same section are explicitly written with "and/or" notations where this one is not, unless it was intended to be that way.

Interpreting 4.2.1.12 using the same logic as section 3 makes sense to me, but wouldn't that also mean "any combination"? I think the errata is both more accurate and harder to comprehend than it is now 😄

christopher-henderson commented 3 years ago

@robstradling (I don't have an account on the PKIX list so I guess I'll throw my two cents onto the floor of this thread)

For however painful it might be for the person who updates the document, perhaps 5280 should explicitly enumerate the valid combinations rather than attempting a plain English description of it? I'm sure that future implementers would deeply appreciate the oven-ready combinations that they could almost copy-paste into their code rather than going through the same pain we're going through right now (and it would likely greatly reduce re-creating these mistakes).

cpu commented 3 years ago

I started to pull on this thread and I'm wondering if there's also something going on with our integration tests.....

I think I figured out what's going on here, and it's a silly bug of my own creation: https://github.com/zmap/zlint/issues/555 I will fix this independently of this discussion.

Unfortunately I think it hid the true impact of this lint, making it seem like only 140 integration test certificates were linted to an error level by this lint. In reality it's 566,924 new error findings.

cpu commented 3 years ago

Unfortunately I think it hid the true impact of this lint, making it seem like only 140 integration test certificates were linted to an error level by this lint. In reality it's 566,924 new error findings.

With this in mind I'm pretty strongly in favour of reverting the lint as it exists now and re-working any bug fixes in a follow-up PR.

cardonator commented 3 years ago

With this in mind I'm pretty strongly in favour of reverting the lint as it exists now and re-working any bug fixes in a follow-up PR.

I completely agree with this and assume it will be a pretty major problem if not reverted for the 3.1.0 release.

cpu commented 3 years ago

I'll push a revert PR + a fix to the integration tests after dinner.

cpu commented 3 years ago

I'll push a revert PR + a fix to the integration tests after dinner.

Revert of the lint landed w/ https://github.com/zmap/zlint/pull/556 Integration test overflow fix: https://github.com/zmap/zlint/pull/557

ZsofiaTomicsko commented 3 years ago

I agree with reverting the lint too, and I apologize for misunderstanding the RFC in the past. Thank you very much for the clarification!

robstradling commented 3 years ago

@ZsofiaTomicsko No apology needed! Your logic was sound, but the text in the RFC was misleading. Thank you for your contributions!

cpu commented 3 years ago

@ZsofiaTomicsko Echoing @robstradling - definitely no need to apologize :-) The RFC text is unclear, it was missed in review, and our integration tests fell down on the job. We caught it before the release, no harm done! Thanks again.