Open foolo opened 5 months ago
Statistics about which percentage of selectors are still available on DNS for emails of different ages:
Not that this does not account for the fact that old selectors may still be on DNS, but with an updated public key, so the DNS record no longer corresponds to the email in question.
I will add more statistics where we only account for "probably key-bound" selectors, such as "202306", "zj6feok33gleqrx3lyj6wcf777va63fa"
src/util/statistics.py --dkimDnsStatsMbox ~/Documents/dkim/mbox/yahoo.mbox ~/Documents/dkim/mbo
x/gmail_priv.mbox ~/Documents/dkim/mbox/oa146.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/yahoo.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/gmail_priv.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/oa146.mbox
INFO:root:checking DNS for domainkeys
2007_Q1Q2: 0 of 1 active domainkeys (0.00%)
2007_Q3Q4: 0 of 4 active domainkeys (0.00%)
2008_Q1Q2: 0 of 4 active domainkeys (0.00%)
2008_Q3Q4: 0 of 4 active domainkeys (0.00%)
2009_Q1Q2: 3 of 5 active domainkeys (60.00%)
2009_Q3Q4: 2 of 11 active domainkeys (18.18%)
2010_Q1Q2: 3 of 11 active domainkeys (27.27%)
2010_Q3Q4: 8 of 21 active domainkeys (38.10%)
2011_Q1Q2: 6 of 16 active domainkeys (37.50%)
2011_Q3Q4: 12 of 29 active domainkeys (41.38%)
2012_Q1Q2: 10 of 20 active domainkeys (50.00%)
2012_Q3Q4: 8 of 26 active domainkeys (30.77%)
2013_Q1Q2: 14 of 30 active domainkeys (46.67%)
2013_Q3Q4: 16 of 38 active domainkeys (42.11%)
2014_Q1Q2: 22 of 34 active domainkeys (64.71%)
2014_Q3Q4: 15 of 25 active domainkeys (60.00%)
2015_Q1Q2: 23 of 41 active domainkeys (56.10%)
2015_Q3Q4: 18 of 35 active domainkeys (51.43%)
2016_Q1Q2: 20 of 36 active domainkeys (55.56%)
2016_Q3Q4: 19 of 45 active domainkeys (42.22%)
2017_Q1Q2: 32 of 50 active domainkeys (64.00%)
2017_Q3Q4: 32 of 44 active domainkeys (72.73%)
2018_Q1Q2: 57 of 79 active domainkeys (72.15%)
2018_Q3Q4: 89 of 126 active domainkeys (70.63%)
2019_Q1Q2: 53 of 81 active domainkeys (65.43%)
2019_Q3Q4: 42 of 66 active domainkeys (63.64%)
2020_Q1Q2: 78 of 114 active domainkeys (68.42%)
2020_Q3Q4: 91 of 126 active domainkeys (72.22%)
2021_Q1Q2: 117 of 138 active domainkeys (84.78%)
2021_Q3Q4: 143 of 174 active domainkeys (82.18%)
2022_Q1Q2: 111 of 141 active domainkeys (78.72%)
2022_Q3Q4: 82 of 111 active domainkeys (73.87%)
2023_Q1Q2: 115 of 148 active domainkeys (77.70%)
2023_Q3Q4: 141 of 170 active domainkeys (82.94%)
2024_Q1Q2: 248 of 266 active domainkeys (93.23%)
Very interesting! So seems to hover around 50%, validating the idea that we could get quite a few old keys.
Very interesting! So seems to hover around 50%, validating the idea that we could get quite a few old keys.
yep! and here are new statistics where we only count "probably key-bound" selectors (for example selectors that contains something that looks like a year, or that look like "zj6feok33gleqrx3lyj6wcf777va63fa" )
src/util/statistics.py --dkimDnsStatsMbox ~/Documents/dkim/mbox/yahoo.mbox ~/Documents/dkim/mbox
/gmail_priv.mbox ~/Documents/dkim/mbox/oa146.mbox --includeOnlyKeyboundSelectors
INFO:root:loading /home/olof/Documents/dkim/mbox/yahoo.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/gmail_priv.mbox
INFO:root:loading /home/olof/Documents/dkim/mbox/oa146.mbox
INFO:root:processing messages
2009_Q1Q2: 0 active domainkeys of total 1 (0.00%)
2009_Q3Q4: 0 active domainkeys of total 1 (0.00%)
2010_Q1Q2: 0 active domainkeys of total 2 (0.00%)
2010_Q3Q4: 1 active domainkeys of total 3 (33.33%)
2011_Q1Q2: 0 active domainkeys of total 3 (0.00%)
2011_Q3Q4: 3 active domainkeys of total 4 (75.00%)
2012_Q1Q2: 0 active domainkeys of total 3 (0.00%)
2012_Q3Q4: 1 active domainkeys of total 6 (16.67%)
2013_Q1Q2: 3 active domainkeys of total 9 (33.33%)
2013_Q3Q4: 4 active domainkeys of total 12 (33.33%)
2014_Q1Q2: 5 active domainkeys of total 11 (45.45%)
2014_Q3Q4: 1 active domainkeys of total 5 (20.00%)
2015_Q1Q2: 5 active domainkeys of total 14 (35.71%)
2015_Q3Q4: 5 active domainkeys of total 10 (50.00%)
2016_Q1Q2: 6 active domainkeys of total 18 (33.33%)
2016_Q3Q4: 5 active domainkeys of total 20 (25.00%)
2017_Q1Q2: 10 active domainkeys of total 16 (62.50%)
2017_Q3Q4: 10 active domainkeys of total 15 (66.67%)
2018_Q1Q2: 13 active domainkeys of total 26 (50.00%)
2018_Q3Q4: 19 active domainkeys of total 42 (45.24%)
2019_Q1Q2: 16 active domainkeys of total 31 (51.61%)
2019_Q3Q4: 8 active domainkeys of total 24 (33.33%)
2020_Q1Q2: 12 active domainkeys of total 39 (30.77%)
2020_Q3Q4: 14 active domainkeys of total 37 (37.84%)
2021_Q1Q2: 21 active domainkeys of total 37 (56.76%)
2021_Q3Q4: 25 active domainkeys of total 44 (56.82%)
2022_Q1Q2: 10 active domainkeys of total 30 (33.33%)
2022_Q3Q4: 12 active domainkeys of total 37 (32.43%)
2023_Q1Q2: 19 active domainkeys of total 47 (40.43%)
2023_Q3Q4: 31 active domainkeys of total 54 (57.41%)
2024_Q1Q2: 45 active domainkeys of total 53 (84.91%)
Created new issue from "Part B" here: https://github.com/zkemail/archive.prove.email/issues/70#issuecomment-2132751659
When a user uploads emails via gmail upload, and a mail has a domain/selector for which we don't know the key, then pass that to a "GCD solver server" which brute-forces public keys and streams it back to plus frontend feedback of GCD public key finder on emails during gmail upload
Steps needed:
Cost: > 4 weeks of work.
Cost/benefit: Personal take: Similar reasoning as with https://github.com/zkemail/archive.prove.email/issues/90. I.e. expensive cost and maintenance but low probability of finding anything. Would be cool when there is actually a result but I would say the chance here is even smaller than with issue #90.
(*) These steps are the ones that are least clear at moment.