Open jvanasco opened 3 years ago
I agree that using a similar model than for CT would be interesting (https://github.com/letsencrypt/website/blob/master/data/transparency.json)
This is a good idea! I don't expect to get to it very soon, since I'm pretty focused on both the upcoming chain switch and ECDSA issuance, but will keep this in mind (or would be happy to review if someone else tackled it!).
The numerous PRs I've made recently against the Chain of Trust page have been directed towards supplementing and structuring the information therein with an eye towards a PR to overhaul the entire page to streamline the presentation and remove redundancies. What say you all towards construction of a JSON tree structure corresponding to the diagram (that @aarongable is hopefully going to commit soon ; ) )?
The initial vision I have is that of an array of root certificate objects (including DST Root CA X3) with each having amongst its properties an array of "signed" intermediate certificate objects.
I'd recommend a structure more like this:
[
{
"displayname": "ISRG Root X1",
"algorithm": "RSA 4096",
"o": "Internet Security Research Group",
"cn": "ISRG Root X1",
"type": "root",
"status": "active",
"certificates": [
{
"displayname": "Self-signed by ISRG Root X1",
"crtsh": <url>,
"txt": <url>,
"pem": <url>,
"der": <url>
},
{ <repeat for cross-sign> }
]
},
{ <repeat for other certs> }
]
The types would be root
and intermediate
, the statuses would be active
, upcoming
, backup
, and retired
.
I suggest this format because the json files in the data directory are consumed by Hugo, and Hugo's templating isn't going to be good at arbitrary-depth descent through a tree of nested roots and intermediates. Of course, you could reflect the structure of the current page more strongly by hauling type
and status
up to be dictionary keys, but then updating requires moving entire sections rather than simply changing a single value.
Just a couple random thoughts:
Just brainstorming; these may be terrible ideas.
I have been tracking this stuff manually for a while and while I am far from a final form, I do have a small preference/learning — using a flat structure appeared to be better when dealing with the cross signs, and the intermediaries then list the roots that signed them. I also list the IdentTrust/trustId/DST root as well. This allows me to build the full chain - including the trusted root - for extensive tests.
IMHO, the payload should also have the have the notAfter/expiry date for each cert too.
This page should not feed its data into CCADB; CCADB is authoritative and only a few people have the right/ability to disclose certificates into it. Having this data file be autogenerated from CCADB would be nice, but doing so requires getting someone to create a new public report (similar to https://ccadb-public.secure.force.com/mozilla/CACertificatesInFirefoxReport) listing all certificates owned by Internet Security Research Group (ISRG)
, so let's save that idea for future improvements.
I generated a quick proof-of-concept here: https://github.com/letsencrypt/website/compare/master...jvanasco:feature-machine_readable_certificates
I don't expect this to work as-is, but changes are trivial, as certificates_build.py
generates the certificates.json
file. The bulk of the work was generating the input data of certificates.
General overview:
I split the certificate payload out from issuer, and split the algorithm into separate type and bits fields. Why? Python is generating this data, and it has it in two fields - so it makes more sense to keep it that way. This script has the same python requirements as Certbot.
The input is a human curated file "_certificate_data.json". It has some basic info about the certs, which can not be derived, such as the URLs and status/type. "_name" is just for editing the input (which could be another file with a "lastmod" date).
The script checks to ensure all the urls are valid and are not duplicated within the payload. It also checks to ensure all the URLs for the certificates are online.
It derives data from the PEM version. it could check the versions against one another. "type" and "status" are copied over. the "signed_by" is used to track the issuer and pegged to the issuer's "pem". if there is a cross-signed version, that is tracked too. the URL of the pem is used as a UUID to link certificates together.
why the flat, not-nested approach?
I keep thinking about how i - and others - would use this data. keeping it flat seems easier and more database like.
the workflow I envision, is that a LetsEncrypt staff member could just alter the input on a file with minimal information, run a script, and a machine readable version that has data which is checked and tested is then generated.
in any event, I'd be happy to submit a PR for this if LetsEncrypt wants to take it over for the reformatting. Otherwise, people can feel free to fork and work on it. If keeping a flat structure, the real customization will be in the output template (lines 193+).
input:
{
"_name": "ISRG Root X1",
"type": "root",
"status": "active",
"crtsh": "https://crt.sh/?id=9314791",
"txt": "https://letsencrypt.org/certs/isrgrootx1.txt",
"pem": "https://letsencrypt.org/certs/isrgrootx1.pem",
"der": "https://letsencrypt.org/certs/isrgrootx1.der",
"signed_by": "https://letsencrypt.org/certs/isrgrootx1.pem", # self-signed
},
output:
{
"certificate": {
"algorithm": "RSA",
"bits": 4096,
"cn": "ISRG Root X1",
"notAfter": "20350604110438Z",
"notBefore": "20150604110438Z",
"o": "Internet Security Research Group",
"selfsigned": true
},
"issuer": {
"cn": "ISRG Root X1",
"o": "Internet Security Research Group",
"url_pem": "https://letsencrypt.org/certs/isrgrootx1.pem"
},
"status": "active",
"type": "root",
"urls": {
"crtsh": "https://crt.sh/?id=9314791",
"der": "https://letsencrypt.org/certs/isrgrootx1.der",
"pem": "https://letsencrypt.org/certs/isrgrootx1.pem",
"txt": "https://letsencrypt.org/certs/isrgrootx1.txt"
}
},
That's pretty cool! A couple notes:
I believe it would be useful to have a machine readable version of the information in
/certificates
. That would allow for client developers and integrators to quickly check if anything has changed, and aid in automatically tracking certificate lineage.The data could be maintained in a json file, similar to those in
/data
and it could serve two purposes: