transparency-dev / static-ct

An implementation of the Static CT API based on Tessera.
Apache License 2.0
1 stars 2 forks source link

Different `LeafHash` and `Signature` returned when submitting the same certificate #38

Open roger2hk opened 1 day ago

roger2hk commented 1 day ago

Existing Behaviour

The dedup was enabled. The checkpoint increased by 1. LeafHash and Signature returned should be the same when the same certificate is submitted.

Expected Behaviour

LeafHash and Signature returned should be the same when the same certificate is submitted.


Logs

static-ct:~/static-ct$ go run github.com/google/certificate-transparency-go/client/ctclient@master upload --log_uri=http://127.0.0.1:6962/staticct --cert_chain=/tmp/httpschain/chain.pem
Uploaded chain of 2 certs to V1 log at http://127.0.0.1:6962/staticct, timestamp: 1730493992119 (2024-11-01 20:46:32.119 +0000 UTC)
LogID: d999a553737848b8cce90b1dacae987368095edb1ee7a642da05bce978e2eaed
LeafHash: 7f6b9e7d13c51d6251b2b47f78da521bad593bf7c7ecbc88d3f465fc6f05cd85
Signature: Signature: Hash=SHA256 Sign=ECDSA Value=304402205a23b16223db3a9b8df988245f3ea3e870980da5a809dce2660c442418d86cc802202212d3b91cfb3bad9f946a70ec3559cca525070505df1902d37960a2c2c5d884

static-ct:~/static-ct$ go run github.com/google/certificate-transparency-go/client/ctclient@master upload --log_uri=http://127.0.0.1:6962/staticct --cert_chain=/tmp/httpschain/chain.pem
Uploaded chain of 2 certs to V1 log at http://127.0.0.1:6962/staticct, timestamp: 1730493998206 (2024-11-01 20:46:38.206 +0000 UTC)
LogID: d999a553737848b8cce90b1dacae987368095edb1ee7a642da05bce978e2eaed
LeafHash: 579f104781b917299771d26e0dc92e35c2e214fb9918112c6c08d7588970f497
Signature: Signature: Hash=SHA256 Sign=ECDSA Value=3046022100ca79892eae32e8815049d7e29690ff1b31ec63d599adf37c3cd7ea86a85fa9b10221009c06c7f1e3df894b2f1b5a46e67607ef9db02b8e1ff2b500ec6c4dfc34959683

static-ct:~/static-ct$ go run github.com/google/certificate-transparency-go/client/ctclient@master upload --log_uri=http://127.0.0.1:6962/staticct --cert_chain=/tmp/httpschain/chain.pem
Uploaded chain of 2 certs to V1 log at http://127.0.0.1:6962/staticct, timestamp: 1730494000004 (2024-11-01 20:46:40.004 +0000 UTC)
LogID: d999a553737848b8cce90b1dacae987368095edb1ee7a642da05bce978e2eaed
LeafHash: 0d5c238955fcfe6d4fc97eee37b8d0e5b396cc7f0c27fb36ac42ddef688024f4
Signature: Signature: Hash=SHA256 Sign=ECDSA Value=304502205057581135c9913ea0b017d8dd6a4cd39dd045365c1e3adce75d8851b9a40a8a022100acc48fc9c98209496d25c3e869d5f4b7a0fd045ea1173223be1e8005095c174e

static-ct:~/static-ct$ go run github.com/google/certificate-transparency-go/client/ctclient@master upload --log_uri=http://127.0.0.1:6962/staticct --cert_chain=/tmp/httpschain/chain.pem
Uploaded chain of 2 certs to V1 log at http://127.0.0.1:6962/staticct, timestamp: 1730494001638 (2024-11-01 20:46:41.638 +0000 UTC)
LogID: d999a553737848b8cce90b1dacae987368095edb1ee7a642da05bce978e2eaed
LeafHash: c103511fa30b66ef3d054b177f6b3e55c0922573222c38b8fd57237a7415f216
Signature: Signature: Hash=SHA256 Sign=ECDSA Value=3046022100dc7e10f8f8360775e28e13915ee80e6b5b07439c2b7d52721cb427a589c3223a022100d156723c4505db735386c6623f08bd83407eb262c153664b327726798171981d

static-ct:~/static-ct$ go run github.com/google/certificate-transparency-go/client/ctclient@master upload --log_uri=http://127.0.0.1:6962/staticct --cert_chain=/tmp/httpschain/chain.pem
Uploaded chain of 2 certs to V1 log at http://127.0.0.1:6962/staticct, timestamp: 1730494027102 (2024-11-01 20:47:07.102 +0000 UTC)
LogID: d999a553737848b8cce90b1dacae987368095edb1ee7a642da05bce978e2eaed
LeafHash: a696b362a273a62558592577fee102344951aed8acfea9ab0edfbf9d5b622061
Signature: Signature: Hash=SHA256 Sign=ECDSA Value=304502200d24ac1dfe0b6fd2fee167f572067d2cee566c62d09222b3c72fd825e220d10b0221009e1a60267c13f68c04dc5ecf167b2f0ec321c2482886f67f442ba81e11352824
phbnf commented 1 day ago

Spot on, thanks! tl;dr: the SCT is recreated for every returned entry with a different timestamp, even if the index doesn't change: https://github.com/transparency-dev/static-ct/blob/4fb2c2ef8d3e8c1b36d34c26cf5c52557404593c/handlers.go#L316

Right now, Tessera deduplication implementations store an index, not an SCT, so we can't support this use-case without changing this. It turns out that all these implementations leave in the static-ct repo for now so all good. This will have an impact on the size of the deduplication database though.


Longer answer

The index of the entry though, should be deduplicated, and the log should not grow further, as you've noticed. It's not great, and goes against Chrome's policy:

When Logs receive a logging submission for an already-incorporated certificate, Logs must either return an existing SCT or, if creating a new one, add another certificate entry within the MMD such that the new SCT can be verified using the APIs specified in RFC 6962.

In the scenario you've observed, the SCTFE dooes not add a new certificate entry to the log. But also, this Chrome log policy statement, is not 100% clear, and might not apply to static-ct-api log. The goal here is to make sure that Certifcates can be verified for inclusion re-using the LeafHash. But with static-ct-api... there's no such get-by-hash API, what matters is the index. Sounds very reasonable to keep this behaviour for backwards compatibility though.

As surprising as it sounds, I don't think that it goes against neither RFC6962, and https://c2sp.org/static-ct-api. But maybe we should tighten https://c2sp.org/static-ct-api around this.

RFC6962 says that you MUST integrate the certificate that was mentioned in the SCT. Note that this is about the certificate, which itself does not contain the SCT. In other words, an SCT means that you MUST integrate this certificate within the MMD, but you MAY use a different timestamp in the entry. What's interesting with static-ct, is that the index in the SCT is not binding, you could return a fake index, and it would still implement the specs. Obviously, the following behaviours, even if I don't believe they violate any standards, aren't very helpful:

AlCutter commented 1 day ago

... [the] Chrome log policy statement, is not 100% clear, and might not apply to static-ct-api log. The goal here is to make sure that Certifcates can be verified for inclusion re-using the LeafHash. But with static-ct-api... there's no such get-by-hash API, what matters is the index

I do not agree with this conclusion :) I think this is a critical bug which undermines the transparency properties of the log.

It will be impossible to prove inclusion for any entry with an SCT whose timestamp doesn't match the one the corresponding pre-certificate was logged with: the timestamp is part of the MerkleTreeLeaf structure, which in turn is hashed to build the Merkle tree - if you don't have the original timestamp, you do not have the preimage which will allow you to recompute the root hash using the proof.

RFC6962 says that you MUST integrate the certificate that was mentioned in the SCT. Note that this is about the certificate, which itself does not contain the SCT. In other words, an SCT means that you MUST integrate this certificate within the MMD, but you MAY use a different timestamp in the entry.

I think 6962 is pretty clear about it, and I'm fairly sure does not suggest anywhere that you may use different timestamps in returned SCTs from those you write to the log. In fact, Section 3.4 explicitly says it's the same timestamp:

  Structure of the Merkle Tree input:

       enum { timestamped_entry(0), (255) }
         MerkleLeafType;

       struct {
           uint64 timestamp;
           LogEntryType entry_type;
           select(entry_type) {
               case x509_entry: ASN.1Cert;
               case precert_entry: PreCert;
           } signed_entry;
           CtExtensions extensions;
       } TimestampedEntry;

       struct {
           Version version;
           MerkleLeafType leaf_type;
           select (leaf_type) {
               case timestamped_entry: TimestampedEntry;
           }
       } MerkleTreeLeaf;

  ...

  "timestamp" is the timestamp of the corresponding SCT issued for this certificate. 

What's interesting with static-ct, is that the index in the SCT is not binding, you could return a fake index, and it would still implement the specs

I would argue this is, in fact, binding: the log has issued a signed statement that says "[the precert corresponding to] this cert was logged at time T, and can be found at index I within MMD", and that statement would be false if returning incorrect T or I.

There would be no need to sign this statement if it weren't binding.

The Static CT spec says:

...LeafIndex value, which is a big-endian unsigned 40-bit integer specifying the 0-based index of the included entry in the log.

This extension makes it possible for auditors to verify inclusion of an SCT in the log by fetching the entry by index, rather than by hash.

i.e. leaf N must contain an entry with a LeafIndex value of N.

Otherwise, auditors hold a signed promise yet are not able to construct a proof which shows that the SCT was honoured.

Obviously, the following behaviours, even if I don't believe they violate any standards, aren't very helpful:

  • an SCT that won't be integrated, leading to a leafHash that will never appear in the log even if the cert is already there
  • an index where the entry underneath doesn't match the SCT
  • an index where the entry underneath doesn't match the Certificate

The first is clearly a violation of 6962 - Section 7.3:

A log can misbehave ... by failing to incorporate a certificate with an SCT in the Merkle Tree within the MMD

The 2nd and 3rd violate the Static CT spec lines I quoted above.

Fortunately, I imagine it should be a fairly easy fix - either adding timestamp to be stored alongside index, or (if that's too expensive) we could have the SCTFE fetch the entry at index when there's a dupe submission and use the data there to reconstruct the original SCT.

phbnf commented 1 day ago

It will be impossible to prove inclusion for any entry with an SCT whose timestamp doesn't match the one the corresponding pre-certificate was logged with I agree with you, which is why I think it's very important that both the timestamp and index are set properly, and that specs / policy are explicit about this. To avoid any mis-understanding, I don't think the way the SCTFE works today is ok, and this bug should therefore be fixed. As I mentioned and you highlighted, we can fix this with better deduplication, all good.

I think we agree on the spirit: there MUST be a matching entry for every returned timestamp in RFC6962, and {index, timestamp} tupple in static-ct-api.

either adding timestamp to be stored alongside index, or (if that's too expensive) we could have the SCTFE fetch the entry at index when there's a dupe submission and use the data there to reconstruct the original SCT.

Agreed! Fetching the entry at index would have different cost implications because one would need to fetch the full entry bundle, and pay (in terms of resources and $) for the corresponding read, and maybe bytes over the wire. So I think I'd prefer to store the timestamp only. One could also store the full SCT, but that would be even larger. The only difference would be that it would allow to serve the exact same SCT again, and not re-generate a signature, which might be different because of non deterministic signature. I don't think this buys anything.


The rest of this conversation is just about being very precise around wording in the policy, RFC, and static-ct-api specs to understand what space they leave. I believe that today, they leave room for misinterpretation, and that's what I meant to highlight in the longer answer part of my comment. I doubled checked them when this issue was filed, and I was surprised to find that there is room for such interpretation. A lot of it has to do with semantics around "certificates" and "entries".

Chrome Policy

When Logs receive a logging submission for an already-incorporated certificate, Logs must either return an existing SCT or, if creating a new one, add another certificate entry within the MMD such that the new SCT can be verified using the APIs specified in RFC 6962.

link to citation

This one is explicit about adding a new entry in the log for every distinct SCT. The goal is to allow for SCT inclusion checking for RFC6962 APIs. It's not 100% clear about what "SCT can be verified" means, and also note that it was added to the policy recently, but let's alleviate that. This statement will probably be reworded to make space for static-ct-api logs? Given the api for static-ct-api log is different, I would to expect for a new entry to be added in the log with a corresponding {index, timestamp} tuple, and not only one of the two.

RFC6962

As opposed to the Chrome Policy, I cannot find anything in RFC6962 that says that the timestamp of the SCT returned by add-pre-chain must be integrated in the log, i.e that there MUST be an entry per SCT. Do you know if it's said anywhere? I'd be interested to see this, because I was surprised not to find it in the RFC.

If only, RFC6962 says that the certificate must be integrated in the tree, not a certificate entry [with a timestamp matching the ones in the SCTs]:

The only thing I could find that would explicitly convey the right meaning is in the 1. Informal introduction section, which as it says, is informal: Similarly, those who have seen signed timestamps from a particular log can later demand a proof of inclusion from that log. If the log is unable to provide this (or, indeed, if the corresponding certificate is absent from monitors' copies of that log), that is evidence of the incorrect operation of the log.

It conveys the right spirit and I hope everybody gets that this is the spirit... but I couldn't find any formal specification of this spirit.

As you point out, the only way to integrate a certificate is to put it in a MerkleTreeLeaf, which itself needs to include a timestamp, which must match the timestamp in the SCT returned for this certificate. But, I couldn't convince myself that the RFC explicitly forbids the following behaviour:

  1. RFC6962 log receives a precert via add-pre-chain
  2. an SCT is issued for it
  3. the log integrates a MerkleTreeLeaf for that certificate, with the timestamp matching the one in the SCT
  4. RFC6962 log receives the same precert again, and issues a new SCT with a different timestamp
  5. No entry with the new timestamp is added in the log

static-ct-api

I would argue this is, in fact, binding: the log has issued a signed statement that says "[the precert corresponding to] this cert was logged at time T, and can be found at index I within MMD", and that statement would be false if returning incorrect T or I.

This statement is validated by the workflow above. One can find the precert corresponding to this cert at index I within T+MMD: it can find be found immediately. T in the SCT and the entry will not match, and that's pretty much what this issue is about. They should match, and the specs should not leave space for confusion around this.

Going back to the quotes in your comment:

I think static-ct-api could be explicit about this. Right now, I don't believe it is. Everybody has the same understanding, so it should not be a problem to add this.