Closed bwbroersma closed 5 days ago
Just for documentation puposes: Scanning CT logs is a huge step forward. However, note that this way the dashboard will not discover:
Thanks @baknu, very true, this mainly benefits the web test, not the mail test. The records with only a MX record or mailing-only domains without a TLS certificate will not be found.
An issues called 'Limit max domains via certificate transparency' can be merged into this issue as this is something to keep in mind when working with this. Getting 1000 subdomains for 1000 domains in your list is fun, but not supported and requires several other optimizations. So knowing in advance how many subdomains might be found / are limited. Or allowing users to cherrypick subdomains etc would make this feature more 'workable'.
Extra note about crt.sh: https://crt.sh/atom is pretty nice since it is XML instead of HTML. The only issue is the response code 429 / rate limiting.
New notes about sql.sh: they allow direct PostgreSQL read access to their database¹. The access is:
$ psql -h crt.sh -p 5432 -U guest certwatch
The schema can be found here https://github.com/crtsh/certwatch_db/
And there is also a showSQL=Y
query parameter to show the SQL executed, e.g. see:
https://crt.sh/?q=internet.nl&showSQL=Y&exclude=expired
Some rate limits apply: it's limited to 5 connections per IP and still regularly gives:
ERROR: canceling statement due to statement timeout
Therefor it's probably best to create some daily dump with new seen Precertificates & Leaf certificates (note see some stats about the crt.sh fill ratio of known certificate serials, because of this, both should be parsed). So maybe an idea would be to have a daily job execute the psql command with -t -A -F"," -c "SELECT ...;"
to output the data in CSV-format, then this can be compressed by a separate other program to a efficient structure.
¹ it seems to be a hot-standby, because of the errors (see stack overflow):
ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
Why I did not know of this (since this is like forever available, at least more than 5 years) .. maybe I would have discovered it earlier if I would by default port scan hostnames I visit ;)
Added a first version to the dashboard. Pending infrastructure changes to get this running on the server.
See: https://github.com/internetstandards/Internet.nl-ct-log-subdomain-suggestions-api
Currently the crt.sh is unstable to use (500 errors). Which means we have to push it to background and cannot show the direct impact of adding the CT log subdomains.
Are there solutions to monitor a CT log server and just log the domain names (not all the crypto / etc.)
Cloning a CT log server would be 1+TB, so that's a bit large (reference: https://letsencrypt.org/2019/11/20/how-le-runs-ct-logs.html#database).
Some links for how the CT log API works: https://security.stackexchange.com/a/167373
e.g. https://oak.ct.letsencrypt.org/2023/ct/v1/get-sth https://oak.ct.letsencrypt.org/2023/ct/v1/get-entries?start=1000&end=1014
some CT log servers:
E.g. something like:
I queried from 256000000 to 256256000, so 1000 requests and 256000 entries, this resulted in 541100 domains (13.28MiB/3.32MiB), and 402560 unique entries (9.84MiB/2.33MiB). My main issue was CPU in
jq
!'It seems to work' for some sample cases, although this should not be used in production*.
Other tools:
Todo:
[x] ~_Find out max records
get-entries
supports per CT log (certificate-transparency groups 2020 discussion)_~Also need to align: https://community.letsencrypt.org/t/enabling-coerced-get-entries/114436
ct-woodpecker
* Note that this is quite hacky code, since
jq
is not the best tool to do binary (chars != bytes, since jq has unicode support). Theleaf_input
is of theMerkleTreeLeaf
structure. So: byte 0 is version, byte 1 is MerkleLeafType, byte 2..9 is timestamp, byte 10..11 is LogEntryType and should be\x00\x00
for a x509_entry. Bothleaf_input
andextra_data
then have a 3 byte length field, that can be skipped over. Because it aligns on 15 bytes and 3 bytes × 8 bit / 6 bit base64 => 15×8/6=20 base64 chars, 3×8/6=4 base64 chars, we can directly operate on the base64 string to skip these bytes. One can also usedd bs=4096 skip=15 iflag=skip_bytes status=none
for the X509 entries anddd bs=4096 skip=3 iflag=skip_bytes status=none
for the PreCertificates. For debug:openssl asn1parse -inform der -i
.See https://datatracker.ietf.org/doc/html/rfc6962#section-3.4 Structure of the Merkle Tree input:
Ideally we would just have a compressed / suffix trie datastructure with the (reversed) Fully qualified domain name (FQDN):