zmap / zlint

X.509 Certificate Linter focused on Web PKI standards and requirements.
https://zmap.io
Apache License 2.0
353 stars 107 forks source link

Add LintTBSCertificate and LintTBSCertificateEx functions #697

Closed robstradling closed 6 months ago

robstradling commented 1 year ago

ZLint's README.md points out that "Pre-issuance linting is strongly recommended by the Mozilla root program". However, despite the claim on that Mozilla wiki page, AFAICT the ZLint API and command-line tool only accept fully-formed X.509 certificates. Therefore, it is not obvious how a CA can use ZLint in a pre-issuance linting environment.

I can only presume that the various CAs' engineers have either:

  1. independently figured out essentially the same approach taken by this PR, or
  2. their approach to "pre-issuance linting" is to (i) perform post-issuance linting and then, if any problems are flagged, (ii) silently discard the (pre)certificate and pretend that it was never issued, instead of submitting it to any CT logs!

This PR proposes two new public API functions, which mirror LintCertificate and LintCertificateEx, but that accept a DER encoded TBSCertificate instead of a *x509.Certificate object. The new functions, LintTBSCertificate and LintTBSCertificateEx, wrap the DER encoded TBSCertificate - together with a "dummy" (but syntactically valid) signature - into a *x509.Certificate object, which is then passed to LintCertificateEx.

The initial purpose of this PR is - in response to an enquiry from @ryancdickson at https://bugzilla.mozilla.org/show_bug.cgi?id=1796803#c5 - to demonstrate how Sectigo has been performing pre-issuance linting for the last several years. I first implemented this approach in 2017 in the open-source code for the https://crt.sh/linttbscert tool. Also, our Go-based (and closed-source) certificate issuance application builds a "dummy" *x509.Certificate object using essentially the same code as the LintTBSCertificateEx function proposed by this PR, and then it calls LintCertificate. The approach taken by LintTBSCertificateEx predates ZLint's "configurable lints" functionality, which is one reason why it generates a syntactically valid signature instead of disabling the e_mp_ecdsa_signature_encoding_correct lint in the lint registry; the other reason is that our certificate issuance application also performs pre-issuance linting using cablint and x509lint, so we needed an approach that would also work with these linters. For example, cablint checks that an ECDSA signature has a syntactically valid ECDSA-Sig-Value structure (that is, a SEQUENCE containing two INTEGERs), and exits early with a FATAL error if this is not the case.

The ultimate aim of this PR is to make it easier for CAs to do pre-issuance linting properly using ZLint. If there's consensus amongst the ZLint maintainers that this is a worthwhile goal, then (and this is why I'm creating this PR as a Draft) I expect we'll want to discuss...

BTW, I've also noticed that the e_ext_authority_key_identifier_no_key_identifier and e_ext_authority_key_identifier_missing lints are problematic when performing pre-issuance linting for (what will ultimately be) a self-signed root certificate. This is because these lints check if the supplied certificate object is self-signed, which of course it won't be if the signature is a "dummy" signature. The util.IsSelfSigned function can't know that the CA intends to generate a self-signature over the TBSCertificate if pre-issuance linting does not flag any problems.

aarongable commented 1 year ago

I can only presume that the various CAs' engineers have either:

  1. independently figured out essentially the same approach taken by this PR, or
  2. their approach to "pre-issuance linting" is to (i) perform post-issuance linting and then, if any problems are flagged, (ii) silently discard the (pre)certificate and pretend that it was never issued, instead of submitting it to any CT logs!

For what it's worth, there is at least one more option, the approach taken by Boulder: sign the certificate with an untrusted, throwaway, in-memory key, and then run zlint on that.

ryancdickson commented 1 year ago

Thank you for following up, @robstradling!

CBonnell commented 1 year ago

I can only presume that the various CAs' engineers have either:

  1. independently figured out essentially the same approach taken by this PR, or
  2. their approach to "pre-issuance linting" is to (i) perform post-issuance linting and then, if any problems are flagged, (ii) silently discard the (pre)certificate and pretend that it was never issued, instead of submitting it to any CT logs!

For what it's worth, there is at least one more option, the approach taken by Boulder: sign the certificate with an untrusted, throwaway, in-memory key, and then run zlint on that.

Yes, I think signing the TBSCertificate with an untrusted key would be the straightforward implementation that is commonly used. No need to write custom logic for each linter or drop precertificates on the floor and pretend to never have issued them (yikes!).

primetomas commented 1 year ago

Yes, the same approach is used by EJBCA. Before using the CAs signing key, the certificate is signed using a hard-coded dummy key/cert (matching the CA key algorithm) and validation (if configured to run) is performed on that.

zakird commented 1 year ago

I'm supportive of this, given that folks are effectively doing this already. Making ZLint as easier to use within a pre-issuance workflow seems like a win.

robstradling commented 1 year ago

Thanks @zakird.

@aarongable, @CBonnell, @primetomas: I agree that signing with a dummy key (matching the same algorithm and size/curve as the real signing key) is a more obvious and straightforward approach than manufacturing a syntactically valid dummy signature or disabling the signature syntax lint(s). However, I think it's worth noting that signing with a dummy key would cost significant additional CPU cycles or HSM signing capacity compared to those other approaches, which could become an issue for a high-volume CA or when bulk-linting a large set of TBSCertificates (if anybody ever has a reason to do that!)

If there's consensus to add new API functions to support pre-issuance linting, then ISTM that the straightforwardness of the implementation under the hood is probably less important than linting performance. I think that what matters in terms of straightforwardness is that it's straightforward for a CA to call the API functions.

How shall we proceed?

cardonator commented 1 year ago

For whatever my opinion is worth... I agree that the most robust and straightforward way to lint prior to issuance is the method that @aarongable mentioned. The value in that is that you aren't circumventing or pre-empting the organic process the CA would go through to issue the certificate, therefore you are testing something that is more "true" to what the CA would actually produce when you lint the final output of the process.

That being said, I can definitely sympathize with the question of high volume and how long the pre-sign, lint, sign series of events can take. There is some value here, but I'm wondering if the value justifies what is lost. I can't say I have a strong opinion on this topic, though, and the overall change here is fairly minimal. I don't see any real downside to making the change.

aarongable commented 1 year ago

However, I think it's worth noting that signing with a dummy key would cost significant additional CPU cycles or HSM signing capacity compared to those other approaches, which could become an issue for a high-volume CA or when bulk-linting a large set of TBSCertificates (if anybody ever has a reason to do that!).

The cost of doing these additional signatures (which happen in software, not on an HSM) has never been an issue for Boulder, which does this check both before precertificate issuance and again before final certificate issuance. Even with https://letsencrypt.org/stats/ showing 240 million currently-active 90-day certificates (and an equal number of precertificates), that's still only an average of ~60 software signatures every second.

Now, generating a dummy key to do these signatures with is considerably more expensive, which is why Boulder generates throwaway linting keys at startup and then continues to use them for the life of the process. But as long as you're not generating a new key for every linting signature, I think that the cost of doing lint signing is reasonably low.

aaomidi commented 1 year ago

Agreed with @aarongable on this one. Signing a certificate is a statistically negligible computational operation as nearly every CPU has optimizations to speed this up. This code can create a false sense of security, and as we all know keeping things as straightforward as possible in crypto generally produces better results.

Also agreed on it's best to not rely on an HSM for this test signature and to use software signatures instead.

robstradling commented 1 year ago

I think the consensus is for this PR to proceed as follows:

Please confirm that this sounds ok, or if necessary help me tweak the requirements further. Thanks!

christopher-henderson commented 1 year ago

Thank you for the summary @robstradling!

I'm thinking through the implementation of such a system.

Dummy key generation is expensive

Would it be helpful if we also offered the ability to be given a key at time of initialization?

That is, if dummy key generation is a concern en masse, then perhaps operating CAs would be interested in keeping a dummy key in their pipeline that can be reused for each preissuance.

robstradling commented 1 year ago

@christopher-henderson Alternatively, why not ship a set of pregenerated dummy keys with ZLint?

To make preissuance linting as easy to use as possible, I would prefer LintTBSCertificate to only require the TBSCertificate as input. If the caller is already doing stuff with dummy keys, then they may as well sign the TBSCertificate with a dummy key and then call LintCertificate.

christopher-henderson commented 1 year ago

@robstradling even better!

I agree that that would be both easier to use and easier to implement (having a static set of known keys, algorithms, and algorithm settings is a fair sight easier).

robstradling commented 1 year ago

@christopher-henderson https://github.com/zmap/zlint/pull/697/commits/65aede599fe16bacd1fc9f6d98ff764be3ea0ded implements the requirements that we discussed in the last few comments of this PR, so at this point I'd like to ask you for another review please.

A few things to mention...

christopher-henderson commented 1 year ago

Thank you @robstradling!

zlint_preissuance.go uses crypto/x509/pkix instead of github.com/zmap/zcrypto/x509/pkix. When I try to use the latter, asn1.Unmarshal throws an error when attempting to decode any pkix.AlgorithmIdentifier object (i.e., the TBSCertificate signature algorithm). I've not been able to figure out why. Any ideas?

I am not 100% sure, I do not believe I have been able to reproduce this. Is the following not representative of the issue you describe?

package zlint

import (
    "github.com/zmap/zcrypto/encoding/asn1"
    "github.com/zmap/zcrypto/x509/pkix"
    "testing"
)

func TestSignatureAlgorithm(t *testing.T) {
    oidSignatureECDSAWithSHA256 := asn1.ObjectIdentifier{1, 2, 840, 10045, 4, 3, 2}
    algo := pkix.AlgorithmIdentifier{
        Algorithm:  oidSignatureECDSAWithSHA256,
        Parameters: asn1.RawValue{},
    }
    b, err := asn1.Marshal(algo)
    if err != nil {
        panic(err)
    }
    got := pkix.AlgorithmIdentifier{}
    _, err = asn1.Unmarshal(b, &got)
    if err != nil {
        panic(err)
    }
    if !oidSignatureECDSAWithSHA256.Equal(got.Algorithm) {
        panic("not the same")
    }
}

This minimum example appears to work fine, although I am not sure if it is precisely what you are seeing.

I've added a dummy lint (lint_tbscertificate_unparsable.go) so that LintTBSCertificate and LintTBSCertificateEx can report TBSCertificate parsing problems. Is this approach OK? Or would it be better for these functions to return (error, *ResultSet) instead?

In general, it is perfectly fine for ZLint framework code to return raw Go errors. That is, the lint.LintResult type is to provide consistent reporting among lints (of which there are many that are implemented by quite a few different people). However, errors that occur outside of a lint are considered quite fatal and (more importantly) are rarely touched by the community, so consistency is not quite as imperative.

Indeed, a perfunctory glance at main shows quite a few examples of indirectly calling os.Exit(1) via log.Fatalf.

Which is certainly not to say that that is particularly good error handling! I just didn't want you to spin too many cycles trying to shim the richer error reporting system into code that isn't expected to used it.

That said, I did have some questions on how we plan to control this flow. From a review and slight refactor, I believe that the following psuedo code is somewhat representative of what I am seeing here.

What we have today is quite simple - read input, decode it, lint it.

func main() {
    input = ...
    cert = decode(input)
    return lint(cert)
}

func decode(bytes) (*x509.Certificate, error) {
    // some PEM stuff
}

What I am seeing in this change is something like the following (I believe)...

func main() {
    input = ...
    if should_lint_preissuance {
        return lint_preissuance(input)
    } else {
        cert = decode(input)
        return lint(cert)
    }
}

func lint_preissuance(bytes) ResultSet {
    cert = decode_and_sign(bytes)
    return lint(cert)
}

However, if I am not mistaken, the heart of the difference is in decode. That is, in either case, we are looking for a *x509.Certificate and the only difference in how precisecly we get there. So if we lift the conditional branch for certificate signing into decode then we should be able to avoid any LintRegistry shenanigans.

func main() {
    input = ...
    cert = decode(input, is_preissuance)
    return lint(cert)
}

func decode(input, is_pressiuance) (*x509.Certificate, error) {
    if is_pressiuance {
        return decode_and_sign(input)
    } else {
        return decode(input)
    }
}

Unless I am mistaken on this, I think this might be preferable as it brings the code usages of LintCertificateEx back down to one. I think this might be attractive since LintCertificateEx is, effectively, a second main (that is, main reads the command flags, reads from the filesystem, parses certificates, and then ultimately calls LintCertificateEx which is the lint driver itself).

zakird commented 6 months ago

@robstradling and @christopher-henderson, it doesn't look like we've made progress on this PR in over a year. @robstradling, do you want to push forward on this, or should we close?

robstradling commented 6 months ago

Hi @zakird. I'm unlikely to find time to work on this any time soon, so feel free to close it unless someone else wants to take it on.

zakird commented 6 months ago

All good. Given that there wasn't as much excitement from others, I'm going to close out. Of course, please feel free to re-open if resurfaces as a priority.