zkemail / archive.prove.email

A repository to store historical, timestamped DKIM keys; and for anyone to upload their own. Basically https://archive.org for public key registries.
https://archive.prove.email
MIT License
5 stars 6 forks source link

Statistics: Estimate how common key rotation within the same selector is #100

Open foolo opened 4 months ago

foolo commented 4 months ago

Using a large set of emails, for each domain-selector-pair in the set, try to dkim verify each email back in time (against current DNS record) and see if there is a pattern that older emails before some date cannot be verified, while newer emails can. This would indicate that the dkim key has been rotated for that selector.

In each specific case, there may of course be other reasons, so there is a lot of noise in the data, but for a large enough set of email, it should be possible to obtain some useful statistics.

Implementation idea: Loop though mbox file(s) and try verify each email with https://pypi.org/project/dkimpy/

Divide-By-0 commented 4 months ago

We can also backdate dkim keys in our database to the earliest email that verifies with that key and domain and selector! Right now we just add the current date right

foolo commented 4 months ago

We can also backdate dkim keys in our database to the earliest email that verifies with that key and domain and selector! Right now we just add the current date right

It's a good idea! Should be quick to implement but I created an issue for it anyhow https://github.com/zkemail/archive.prove.email/issues/101

foolo commented 4 months ago

I have collected some stats but it's not entirely obvious how to interpret it. I have handcrafted a measure on how likely it is that a certain DSP has been rotated, based on the verification status of different dates.

The crazy image below shows the DSPs sorted by this measure. For example, the 1st line shows a DSP which started failing in 2022. I am thankful for any additional ideas, or questions for clarification.

output2

foolo commented 4 months ago

@Divide-By-0 This is for combined_emails.mbox : output