zmap / zlint

X.509 Certificate Linter focused on Web PKI standards and requirements.
https://zmap.io
Apache License 2.0
361 stars 110 forks source link

Revert "lints: remove w_serial_number_low_entropy lint." #857

Closed aaomidi closed 2 months ago

aaomidi commented 5 months ago

There are effectively only three CAs using exactly 64 bit length serial numbers. (Searched with (labels="trusted" and labels="precert" and validation.nss.has_trusted_path=true and not labels="revoked") and parsed.serial_number_hex="????????????????").

Let's considering bringing this length check back, and actually make it trigger a warning on <= 8. There's practically no reason to be riding the "entropy" limit this closely and I think a warning is perfect for that type of lint.

CBonnell commented 5 months ago

My two cents... any type of "entropy" checks against a single sample will ultimately be error-prone. A high-volume issuer that includes several octets of CSPRNG output beyond the minimum of 8 octets will occasionally generate serial numbers that fall below the minimal bound specified in this lint. Furthermore, any CA that implements controls to manipulate the generated serial number in an attempt to suppress any warnings from the lint will actually have decreased the number of unpredictable bits in the serial number.

Conversely, any CA that includes fixed values in the serial number but otherwise includes exactly 64 bits (or less) of CSPRNG output would not have their certificates flagged by this lint.

The only way to reliably implement this check is to examine a representative sample of a CA's issued corpus of certificates and perform a statistical analysis. As such, I believe this lint would be of negative value as the noise it will generate will be more than any sort of useful insight.

Edit: I finally remember where this was discussed before https://github.com/amazon-archives/certlint/issues/56

aaomidi commented 5 months ago

The reason I'm thinking of introducing this lint isn't really to penalize CAs or whatever. It's definitely nothing more than a warning. I'm also planning on making it warn on exactly 64 bit lengths too. I am aware that a single lint can not check for entropy.

My reasoning is:

  1. This acts as a signal for new CAs to make sure they're not right on the line where it can turn into an incident for them.
  2. There's practically no CA that's using 64 bit or less serials. If they are, they can choose to ignore this lint.

From the three that are, GoDaddy has made a claim in the past that they want to go to 128 bit serials: https://groups.google.com/g/mozilla.dev.security.policy/c/S2KNbJSJ-hs/m/Wnt4QSO9BwAJ

Is there a major reason why we shouldn't have this lint? Maybe we can introduce an "Informational" level result and have this lint use it?

mcpherrinm commented 5 months ago

My personal opinion is that the entropy requirements on serial numbers are a defense against bad hash functions, and we've more directly solved that already by removing SHA-1 and MD-5 from the WebPKI ecosystem, and so are not high on my priority list of things to worry about.

There is a strong argument that you can't determine the "entropy" of a serial by examining a single one, for several reasons: Having some number of leading zeros could legitimately happen, so we can't just require it to be greater than 2^64. Bad RNGs can also generate large-looking serials. Thus I don't think a lint like this is really going to be effective. Statistical analysis over a large set of serials could be effective, but zlint doesn't seem like the right place for that.

aaomidi commented 5 months ago

My personal opinion is that the entropy requirements on serial numbers are a defense against bad hash functions, and we've more directly solved that already by removing SHA-1 and MD-5 from the WebPKI ecosystem, and so are not high on my priority list of things to worry about.

Yes, that is why the requirement was introduced. The thing is, that requirement is now in the BRs, and new CAs are much more likely to mess this up when this lint can help inform them, preventing a mistake there.

There is a strong argument that you can't determine the "entropy" of a serial by examining a single one, for several reasons:

As mentioned in my comment above, I'm proposing adding this lint back with the understanding that this won't actually measure entropy but act as an "FYI new CA, you're on the line here and there's no reason to be"

Does re-introducing this lint cause any issues?

CBonnell commented 5 months ago

Does re-introducing this lint cause any issues?

CONTRIBUTING.md says:

Choosing Result Level. Lints return a single type of status: ... Warning: Warn can only be used for violations of SHOULD or SHOULD NOT requirements and again should include strong citations. Many certificate authorities block on both Error and Warning lints, and Warning lints should not be used for non-deterministic errors (e.g., calculating whether a serial number has sufficient entropy based on high-order bits.)

Considering that the guidance explicitly calls out what this PR is attempting to do as undesired, I think it shouldn't be added. If folks feel that this lint is valuable, it should probably be an INFO- or NOTICE-level lint.

aaomidi commented 5 months ago

it should probably be an INFO- or NOTICE-level lint.

I am in agreement here.

mcpherrinm commented 5 months ago

One more thing we see repeatedly in some WebPKI incidents is CAs not understanding the requirements, and assuming the linters will tell them if they're doing something wrong.

Besides the fact that I think this lint is fundamentally not doing what it claims to be doing (measuring entropy), it may give CAs a false sense that they don't need to worry about this requirement. An incrementing serial that starts at 2^64 would pass this lint but provide none of the desired entropy requirements.

I would be more supportive of a version of this lint which examines the number of 0 and 1 bits in the serial and fires if there aren't enough of either to be plausibly randomly generated. That's closer to something that's measuring the number of bits of entropy in a serial, and not just looking if it's "big enough" in a way that's prone to mis-firing.

If there's 64 bits of randomness, I'd expect (off the top of my head) at least 24 1 bits to be set, and a similar (but less, due to the omitted leading 0s) number of 0 bits. I haven't done the math to figure what the exact numbers are to have a sufficiently low probability of mis-firing on the webpki scale of certificate issuance.

aaomidi commented 5 months ago

One more thing we see repeatedly in some WebPKI incidents is CAs not understanding the requirements, and assuming the linters will tell them if they're doing something wrong.

This doesn't change with the inclusion, or not, of this lint. We're not close to being able to get to the point where linters can actually enforce every single requirement - however they can act as signals and I still find that valuable for WebPKI. I'm not interested in CAs having incidents to show that they don't understand the requirements. My goal is to make it easier to run a CA with notices of "hey you might be doing something wrong here".

--

Beyond that I do not want to turn this lint into a lint trying to measure entropy. Doing so with a single cert is not useful in my opinion. I'm fine with changing the name to make it even clearer that its not measuring entropy.

@mcpherrinm, if the name of this lint changes - would you be onboard with getting this onto zlint? (Also if you have any name suggestions, I'd appreciate it! Naming things is hard :P)

timfromdigicert commented 5 months ago

I oppose this. It is an unnecessary warning that will just increase confusion instead of decreasing it.

There's no reason to consider it risky to generate only the exact amount of randomness that is required, if it comes from a sufficiently secure random number generator. The bar already includes a safety margin. If anyone thinks there is a reason that only 64 bits is insufficient, we should raise the bar. But we should not be producing warnings for people that meet or exceed the bar.

aaomidi commented 5 months ago

But we should not be producing warnings for people that meet or exceed the bar.

I've switched this to a notice level lint.

My justification for adding this is as follows:

@timfromdigicert: Given this clarification, is there a risk you see for adding this lint - as a notice level lint - just to act as a "hey you may be doing something wrong here" signal?

timfromdigicert commented 5 months ago

Well, so the problem is that the actual requirement is:

"MUST be a non‐sequential number greater than zero (0) and less than 2¹⁵⁹ containing at least 64 bits of output from a CSPRNG."

Where it was very carefully written that way to avoid any requirements on the encoded value, because a compliant serial number can be extremely small, like for example 42, as long as that happens with probability 2^-64. That's the danger of trying to technically enforce this particular requirement based on examination of a single certificate. You can't REALLY say whether a CA complies with this or not without seeing the implementation, though a population of certificates can prove with high probability that the implementation is likely flawed.

I think I can get behind an INFO level lint explaining some of the complexity and danger here, as long as it clearly explains the situation. I want to meet you in the middle here somewhere. What I don't want is a lot of CAs having fire drills because they get a confusing message from a linter, despite doing nothing wrong. And I'm sensitive to that because we see a lot of CAs ignoring messages from linters and writing them off as false positives, even though they aren't. That's why I'm a bit sensitive to linters generating extraneous warnings and information, because it can desensitize users to real issues.

mcpherrinm commented 5 months ago

There's no reason to be generating serials this close to the limit anyway

I don't really buy this argument. I think it is valid to use exactly 64-bit random serials, and having a lint that implies otherwise is more likely to lead to workarounds (like the EJBCA 63-bit one) than to fix real problems.

The 63-bit problem is a real one, but it cannot be reliably detected from a single serial, and wouldn't be detected by this lint for the case where the top bit is force-set.

I'm going to open another PR with the idea I'd proposed earlier for counting bits with some analysis showing it has a very low probability of false-positives.

CBonnell commented 5 months ago

The 3 CAs that are currently in 64 bit mode

I am aware of at least one major CA that generated large serial numbers but contained only 64 bits of CSPRNG output. The non-random octets were seemingly metadata. This was a few years ago and the CA may have since changed its issuance practices, but there are likely more than 3 CAs right at the edge of the 64 bit requirement -- but it wouldn't appear that way by examining just one certificate.

aaomidi commented 5 months ago

I think I can get behind an INFO level lint explaining some of the complexity and danger here, as long as it clearly explains the situation. I want to meet you in the middle here somewhere. What I don't want is a lot of CAs having fire drills because they get a confusing message from a linter, despite doing nothing wrong. And I'm sensitive to that because we see a lot of CAs ignoring messages from linters and writing them off as false positives, even though they aren't. That's why I'm a bit sensitive to linters generating extraneous warnings and information, because it can desensitize users to real issues.

@timfromdigicert I agree with what you're saying here. I do not want to cause unnecessary stress & anguish for CAs. I've been on the other side of that coin and I am in complete agreement with you that we don't want this to happen.

To be clear, the vast majority of CAs are using 128 bit sized serial numbers for their certificates. With this change, the chance of this notice happening for a CA that uses 128 bits would mean the first 64 bits of the serial number are 0, and the 65th bit has to be 0 to get a serial number that gets encoded as a 64 bit value. That is a 2^-65 chance of happening. Effectively there is a lower chance of this happening, than getting a collision in a UID space.

I an spend some more time writing godocs inside this linter to ensure that if someone does stumble upon this, it may very well not be an issue.

For CAs that are between 64 and 128 bit serials, the chance of this notice is higher, but with their issuance volume - it is extremely unlikely to hit this condition (For example: the current number of active certificates with serials smaller than 128 bits doesn't add up to half a million)

I am aware of at least one major CA that generated large serial numbers but contained only 64 bits of CSPRNG output.

@CBonnell I'm basing my information off of doing research on censys.io, which may or may not be a complete data set.

I've been using this query (not the best, but works): (labels="trusted" and labels="precert" and validation.nss.has_trusted_path=true and not labels="revoked") and parsed.serial_number_hex="????????????????"

I'm going to open another PR with the idea I'd proposed earlier for counting bits with some analysis showing it has a very low probability of false-positives.

@mcpherrinm This would be a great contribution to this project!

zakird commented 2 months ago

Based on conversation, it doesn't look like there is new support to add this lint compared to past conversations, and I'm going to close.

aaomidi commented 2 months ago

IMO I think this is still generally valuable, and the response here was mostly stuck around the semantics of "entropy".

I'll be honest, I don't think any of the arguments against this lint actually holds water, and is just disagreeing for the sake of disagreeing. The main thing I think that people dislike is the use of the word entropy, which is fair, and I think this lint can be a warning or even informational level lint on hmm your serial number is potentially too small in size.

Per the investigations I had when writing this lint, this lint shouldn't cause errors for existing CAs, and really was designed for new CAs spinning up new infrastructure.