Open roger2hk opened 1 day ago
Spot on, thanks! tl;dr: the SCT is recreated for every returned entry with a different timestamp, even if the index doesn't change: https://github.com/transparency-dev/static-ct/blob/4fb2c2ef8d3e8c1b36d34c26cf5c52557404593c/handlers.go#L316
Right now, Tessera deduplication implementations store an index, not an SCT, so we can't support this use-case without changing this. It turns out that all these implementations leave in the static-ct repo for now so all good. This will have an impact on the size of the deduplication database though.
Longer answer
The index of the entry though, should be deduplicated, and the log should not grow further, as you've noticed. It's not great, and goes against Chrome's policy:
When Logs receive a logging submission for an already-incorporated certificate, Logs must either return an existing SCT or, if creating a new one, add another certificate entry within the MMD such that the new SCT can be verified using the APIs specified in RFC 6962.
In the scenario you've observed, the SCTFE dooes not add a new certificate entry to the log. But also, this Chrome log policy statement, is not 100% clear, and might not apply to static-ct-api log. The goal here is to make sure that Certifcates can be verified for inclusion re-using the LeafHash
. But with static-ct-api... there's no such get-by-hash API, what matters is the index. Sounds very reasonable to keep this behaviour for backwards compatibility though.
As surprising as it sounds, I don't think that it goes against neither RFC6962, and https://c2sp.org/static-ct-api. But maybe we should tighten https://c2sp.org/static-ct-api around this.
RFC6962 says that you MUST integrate the certificate that was mentioned in the SCT. Note that this is about the certificate, which itself does not contain the SCT. In other words, an SCT means that you MUST integrate this certificate within the MMD, but you MAY use a different timestamp in the entry. What's interesting with static-ct, is that the index in the SCT is not binding, you could return a fake index, and it would still implement the specs. Obviously, the following behaviours, even if I don't believe they violate any standards, aren't very helpful:
an index where the entry underneath doesn't match the Certificate
We should probably make specs explicit about these behaviours. Or maybe policies are better suited for this.
... [the] Chrome log policy statement, is not 100% clear, and might not apply to static-ct-api log. The goal here is to make sure that Certifcates can be verified for inclusion re-using the LeafHash. But with static-ct-api... there's no such get-by-hash API, what matters is the index
I do not agree with this conclusion :) I think this is a critical bug which undermines the transparency properties of the log.
It will be impossible to prove inclusion for any entry with an SCT whose timestamp doesn't match the one the corresponding pre-certificate was logged with: the timestamp is part of the MerkleTreeLeaf
structure, which in turn is hashed to build the Merkle tree - if you don't have the original timestamp, you do not have the preimage which will allow you to recompute the root hash using the proof.
RFC6962 says that you MUST integrate the certificate that was mentioned in the SCT. Note that this is about the certificate, which itself does not contain the SCT. In other words, an SCT means that you MUST integrate this certificate within the MMD, but you MAY use a different timestamp in the entry.
I think 6962 is pretty clear about it, and I'm fairly sure does not suggest anywhere that you may use different timestamps in returned SCTs from those you write to the log. In fact, Section 3.4 explicitly says it's the same timestamp:
Structure of the Merkle Tree input:
enum { timestamped_entry(0), (255) }
MerkleLeafType;
struct {
uint64 timestamp;
LogEntryType entry_type;
select(entry_type) {
case x509_entry: ASN.1Cert;
case precert_entry: PreCert;
} signed_entry;
CtExtensions extensions;
} TimestampedEntry;
struct {
Version version;
MerkleLeafType leaf_type;
select (leaf_type) {
case timestamped_entry: TimestampedEntry;
}
} MerkleTreeLeaf;
...
"timestamp" is the timestamp of the corresponding SCT issued for this certificate.
What's interesting with static-ct, is that the index in the SCT is not binding, you could return a fake index, and it would still implement the specs
I would argue this is, in fact, binding: the log has issued a signed statement that says "[the precert corresponding to] this cert was logged at time T
, and can be found at index I
within MMD", and that statement would be false if returning incorrect T
or I
.
There would be no need to sign this statement if it weren't binding.
The Static CT spec says:
...LeafIndex value, which is a big-endian unsigned 40-bit integer specifying the 0-based index of the included entry in the log.
This extension makes it possible for auditors to verify inclusion of an SCT in the log by fetching the entry by index, rather than by hash.
i.e. leaf N
must contain an entry with a LeafIndex
value of N
.
Otherwise, auditors hold a signed promise yet are not able to construct a proof which shows that the SCT was honoured.
Obviously, the following behaviours, even if I don't believe they violate any standards, aren't very helpful:
- an SCT that won't be integrated, leading to a leafHash that will never appear in the log even if the cert is already there
- an index where the entry underneath doesn't match the SCT
- an index where the entry underneath doesn't match the Certificate
The first is clearly a violation of 6962 - Section 7.3:
A log can misbehave ... by failing to incorporate a certificate with an SCT in the Merkle Tree within the MMD
The 2nd and 3rd violate the Static CT spec lines I quoted above.
Fortunately, I imagine it should be a fairly easy fix - either adding timestamp
to be stored alongside index
, or (if that's too expensive) we could have the SCTFE fetch the entry at index
when there's a dupe submission and use the data there to reconstruct the original SCT.
It will be impossible to prove inclusion for any entry with an SCT whose timestamp doesn't match the one the corresponding pre-certificate was logged with
I agree with you, which is why I think it's very important that both the timestamp and index are set properly, and that specs / policy are explicit about this. To avoid any mis-understanding, I don't think the way the SCTFE works today is ok, and this bug should therefore be fixed. As I mentioned and you highlighted, we can fix this with better deduplication, all good.
I think we agree on the spirit: there MUST be a matching entry for every returned timestamp
in RFC6962, and {index, timestamp}
tupple in static-ct-api.
either adding timestamp to be stored alongside index, or (if that's too expensive) we could have the SCTFE fetch the entry at index when there's a dupe submission and use the data there to reconstruct the original SCT.
Agreed! Fetching the entry at index
would have different cost implications because one would need to fetch the full entry bundle, and pay (in terms of resources and $) for the corresponding read, and maybe bytes over the wire. So I think I'd prefer to store the timestamp only. One could also store the full SCT, but that would be even larger. The only difference would be that it would allow to serve the exact same SCT again, and not re-generate a signature, which might be different because of non deterministic signature. I don't think this buys anything.
The rest of this conversation is just about being very precise around wording in the policy, RFC, and static-ct-api specs to understand what space they leave. I believe that today, they leave room for misinterpretation, and that's what I meant to highlight in the longer answer
part of my comment. I doubled checked them when this issue was filed, and I was surprised to find that there is room for such interpretation. A lot of it has to do with semantics around "certificates" and "entries".
When Logs receive a logging submission for an already-incorporated certificate, Logs must either return an existing SCT or, if creating a new one, add another certificate entry within the MMD such that the new SCT can be verified using the APIs specified in RFC 6962.
This one is explicit about adding a new entry in the log for every distinct SCT. The goal is to allow for SCT inclusion checking for RFC6962 APIs. It's not 100% clear about what "SCT can be verified" means, and also note that it was added to the policy recently, but let's alleviate that. This statement will probably be reworded to make space for static-ct-api logs? Given the api for static-ct-api log is different, I would to expect for a new entry to be added in the log with a corresponding {index, timestamp} tuple, and not only one of the two.
As opposed to the Chrome Policy, I cannot find anything in RFC6962 that says that the timestamp of the SCT returned by add-pre-chain must be integrated in the log, i.e that there MUST be an entry per SCT. Do you know if it's said anywhere? I'd be interested to see this, because I was surprised not to find it in the RFC.
If only, RFC6962 says that the certificate
must be integrated in the tree, not a certificate entry [with a timestamp matching the ones in the SCTs]
:
The log MUST incorporate a certificate in its Merkle Tree within the Maximum Merge Delay period after the issuance of the SCT.
The SCT is the log's promise to incorporate the certificate in the Merkle Tree within a fixed amount of time known as the Maximum Merge Delay (MMD).
A log can misbehave in two ways: (1) by failing to incorporate a certificate with an SCT in the Merkle Tree within the MMD and..
The only thing I could find that would explicitly convey the right meaning is in the 1. Informal introduction
section, which as it says, is informal:
Similarly, those who have seen signed timestamps from a particular log can later demand a proof of inclusion from that log. If the log is unable to provide this (or, indeed, if the corresponding certificate is absent from monitors' copies of that log), that is evidence of the incorrect operation of the log.
It conveys the right spirit and I hope everybody gets that this is the spirit... but I couldn't find any formal specification of this spirit.
As you point out, the only way to integrate a certificate is to put it in a MerkleTreeLeaf
, which itself needs to include a timestamp, which must match the timestamp in the SCT returned for this certificate. But, I couldn't convince myself that the RFC explicitly forbids the following behaviour:
MerkleTreeLeaf
for that certificate, with the timestamp matching the one in the SCTThe log MUST incorporate a certificate in its Merkle Tree within the Maximum Merge Delay period after the issuance of the SCT.
: is still true, the certificate was added before it was even submitted a second time, in step 3. One could argue that to comply with after a new one must be added. But that can get into the weeds of batch integration.The SCT is the log's promise to incorporate the certificate in the Merkle Tree within a fixed amount of time known as the Maximum Merge Delay (MMD).
: is still true, the certificate was added before it was even submitted a second time, in step 3. A log can misbehave in two ways: (1) by failing to incorporate a certificate with an SCT in the Merkle Tree within the MMD and...
: same here, the cert is in the log already. The first is clearly a violation of 6962:
I don't interpret this is as being a violation of Section 7.3. The certificate is included in the log with an SCT within MMD. It would a violation if it said the SCT is the log's promise to incorporate a new entry with a corresponding certificate and timestamp
."timestamp" is the timestamp of the corresponding SCT issued for this certificate.
: this is the case for the MerkleTreeLeaf
integrated in step 3., and does not apply for the second submission since, in that scenario, no new entry was created. I would argue this is, in fact, binding: the log has issued a signed statement that says "[the precert corresponding to] this cert was logged at time T, and can be found at index I within MMD", and that statement would be false if returning incorrect T or I.
This statement is validated by the workflow above. One can find the precert corresponding to this cert at index I within T+MMD: it can find be found immediately. T
in the SCT and the entry will not match, and that's pretty much what this issue is about. They should match, and the specs should not leave space for confusion around this.
Going back to the quotes in your comment:
...LeafIndex value, which is a big-endian unsigned 40-bit integer specifying the 0-based index of the included entry in the log.
: this statement does not specify which entry, it could be a previous one. (If only, it refers to the included entry in the log
, but that entry might no even be included in the log yet.)This extension makes it possible for auditors to verify inclusion of an SCT in the log by fetching the entry by index, rather than by hash.
. Correct, you can verify for inclusion, but it still doesn't say that it MUST be included.I think static-ct-api could be explicit about this. Right now, I don't believe it is. Everybody has the same understanding, so it should not be a problem to add this.
Existing Behaviour
The dedup was enabled. The checkpoint increased by 1.
LeafHash
andSignature
returned should be the same when the same certificate is submitted.Expected Behaviour
LeafHash
andSignature
returned should be the same when the same certificate is submitted.Logs