letsencrypt / boulder

An ACME-based certificate authority, written in Go.
Mozilla Public License 2.0
5.14k stars 601 forks source link

crl-updater: CRL number not strictly increasing sometimes in integration test #7590

Open pgporada opened 1 month ago

pgporada commented 1 month ago

Seen here

    crl_test.go:82: 17:15:47.907207 6 crl-updater wY39mAE Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":1,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.908621 6 crl-updater jtiu-gM Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":1,"crlNumber":1720804547693645707}] expiresAfter=[2024-10-04 11:04:38 +0000 UTC m=+7235330.321910629] expiresBefore=[2024-10-14 11:04:38 +0000 UTC m=+8099330.321910629] numEntries=[5]
    crl_test.go:82: 17:15:47.910370 6 crl-updater 3b7T5Qw Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":4,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.913398 6 crl-updater uo7WSwA Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":6,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.913586 6 crl-updater pr34tg0 Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":3,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.913767 6 crl-updater 1omZKgA Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":4,"crlNumber":1720804547693645707}] expiresAfter=[2024-07-26 11:04:38 +0000 UTC m=+1187330.321910629] expiresBefore=[2024-08-05 11:04:38 +0000 UTC m=+2051330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.915780 6 crl-updater p_DiyA0 Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":6,"crlNumber":1720804547693645707}] expiresAfter=[2024-08-15 11:04:38 +0000 UTC m=+2915330.321910629] expiresBefore=[2024-08-25 11:04:38 +0000 UTC m=+3779330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.915931 6 crl-updater qdC2sQg Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":7,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.917254 6 crl-updater teOYzAU Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":2,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.917617 6 crl-updater 39fglws Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.918202 6 crl-updater xJWe9gU Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":7,"crlNumber":1720804547693645707}] expiresAfter=[2024-08-25 11:04:38 +0000 UTC m=+3779330.321910629] expiresBefore=[2024-09-04 11:04:38 +0000 UTC m=+4643330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.918573 6 crl-updater nKOA4Ac Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":3,"crlNumber":1720804547693645707}] expiresAfter=[2024-07-16 11:04:38 +0000 UTC m=+323330.321910629] expiresBefore=[2024-07-26 11:04:38 +0000 UTC m=+1187330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.918607 6 crl-updater zImA7QM Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":8,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.918732 6 crl-updater zuCznwQ Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":5,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.918932 6 crl-updater itLqyQE Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":2,"crlNumber":1720804547693645707}] expiresAfter=[2024-07-06 11:04:38 +0000 UTC m=-540669.678089371] expiresBefore=[2024-07-16 11:04:38 +0000 UTC m=+323330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.919027 6 crl-updater 27qrkAk Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":1,"crlNumber":1720804547693645707}] size=[734] hash=[d532239ed08b51442db69243f1516019cd96a83eb1efa5018fa910d2f00de068]
    crl_test.go:82: 17:15:47.919586 6 crl-updater _8ebiAk Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":8,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-04 11:04:38 +0000 UTC m=+4643330.321910629] expiresBefore=[2024-09-14 11:04:38 +0000 UTC m=+5507330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.920905 6 crl-updater ivu07QU Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":5,"crlNumber":1720804547693645707}] expiresAfter=[2024-08-05 11:04:38 +0000 UTC m=+2051330.321910629] expiresBefore=[2024-08-15 11:04:38 +0000 UTC m=+2915330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.920974 6 crl-updater womEmQQ Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507330.321910629] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371330.321910629] numEntries=[1]
    crl_test.go:82: 17:15:47.923938 6 crl-updater -am8lAI Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":4,"crlNumber":1720804547693645707}] size=[503] hash=[4932ba019b030354668724adfa2cb190cbcb326234459043f7135ec3c4214834]
    crl_test.go:82: 17:15:47.926270 6 crl-updater j4_jqg8 Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":10,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.926406 6 crl-updater 3vCtzgc Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":3,"crlNumber":1720804547693645707}] size=[503] hash=[19ae6e130fc89753ae239aabab6a33452e8da7be14a695356524bc59cc9665e0]
    crl_test.go:82: 17:15:47.927305 6 crl-updater wu-yCwA Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":10,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-24 11:04:38 +0000 UTC m=+6371330.321910629] expiresBefore=[2024-10-04 11:04:38 +0000 UTC m=+7235330.321910629] numEntries=[0]
    crl_test.go:82: 17:15:47.928551 6 crl-updater oIjOpgo Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":8,"crlNumber":1720804547693645707}] size=[503] hash=[d40879d3d03b701aa9867fc7d0c99077c4de2d19c979faff0131a70b1e5fd83f]
    crl_test.go:82: 17:15:47.930073 6 crl-updater hNvklQM Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":6,"crlNumber":1720804547693645707}] size=[503] hash=[83c7ead6419360de0777ec3402a038fb0ee43bb8f0890f590b2921dad969da3d]
    crl_test.go:82: 17:15:47.933638 6 crl-updater idbHmg0 Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":10,"crlNumber":1720804547693645707}] size=[504] hash=[d12c419c71d00cdfabfeab8a9bf10a4071f65a82bb068033055b5364a91db530]
    crl_test.go:82: 17:15:47.933910 3 crl-updater sa64nwY [AUDIT] Generating CRL failed, will retry in 1.051591171s: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] err=[closing CRLStorer upload stream: rpc error: code = Unknown desc = crlNumber not strictly increasing: 1720804547693645707 <= 1720804547713446378]
    crl_test.go:82: 17:15:47.936886 6 crl-updater oIet_wI Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":2,"crlNumber":1720804547693645707}] size=[503] hash=[6d015868af0d3a34d587971114478e065d710eee1071e18070c11e007e408425]
    crl_test.go:82: 17:15:47.940046 6 crl-updater wOz-pwU Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":5,"crlNumber":1720804547693645707}] size=[503] hash=[a266f01d72a0ecc93c3879976953776418d7790bdfe11dbca44811aa0203a3e1]
    crl_test.go:82: 17:15:47.942772 6 crl-updater 8fnwkwc Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":7,"crlNumber":1720804547693645707}] size=[503] hash=[541c554eb842021acb388c8fde4d520d167c89ee79fc945d830bdeb68bc2ea6a]
    crl_test.go:82: 17:15:48.986287 6 crl-updater 39fglws Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:48.987623 6 crl-updater womEmQQ Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507330.321910629] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371330.321910629] numEntries=[1]
    crl_test.go:82: 17:15:48.991455 3 crl-updater pfimgwI [AUDIT] Generating CRL failed: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] err=[closing CRLStorer upload stream: rpc error: code = Unknown desc = crlNumber not strictly increasing: 1720804547693645707 <= 1720804547713446378]
    crl_test.go:82: 17:15:48.997658 3 crl-updater parBngE [AUDIT] one or more errors encountered, see logs
    crl_test.go:82: 
    crl_test.go:82: crl-updater failed: exit status 1
aarongable commented 1 month ago

From the full test log, but extracting just the lines relevant to the failed shard:

    crl_test.go:67: 17:15:47.544863 6 crl-updater 36fbtw8 Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547385500723}] numChunks=[1]
    crl_test.go:67: 17:15:47.548081 6 crl-updater -Lf9oA8 Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547385500723}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507330.628751669] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371330.628751669] numEntries=[1]
    crl_test.go:67: 17:15:47.569982 6 crl-updater uPHblAk Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547385500723}] size=[557] hash=[101abd6116f9849e3a3157977bb0aedf92175afe8eae3225e742c6f52dfd670d]
    crl_test.go:82: 17:15:47.917617 6 crl-updater 39fglws Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:47.920974 6 crl-updater womEmQQ Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507330.321910629] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371330.321910629] numEntries=[1]
    crl_test.go:82: 17:15:47.933910 3 crl-updater sa64nwY [AUDIT] Generating CRL failed, will retry in 1.051591171s: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] err=[closing CRLStorer upload stream: rpc error: code = Unknown desc = crlNumber not strictly increasing: 1720804547693645707 <= 1720804547713446378]
    crl_test.go:82: 17:15:48.986287 6 crl-updater 39fglws Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] numChunks=[1]
    crl_test.go:82: 17:15:48.987623 6 crl-updater womEmQQ Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507330.321910629] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371330.321910629] numEntries=[1]
    crl_test.go:82: 17:15:48.991455 3 crl-updater pfimgwI [AUDIT] Generating CRL failed: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547693645707}] err=[closing CRLStorer upload stream: rpc error: code = Unknown desc = crlNumber not strictly increasing: 1720804547693645707 <= 1720804547713446378]

This shows the following sequence of events: 1) Successfully generates a CRL with number 1720804547385500723 2) Fails to generate a CRL with number 1720804547693645707 , due to the existence of a CRL with number 1720804547713446378 3) Retries, fails again for the same reason

The obvious question is: where did CRL 77 come from?

The crl_test.go logs show no reference to it, except for the two failure messages. We can conclude that it wasn't generated by crl_test.go.

However, if we look earlier in the integration test output, we can see this collection of log lines:

17:15:47.645084 6 crl-storer ot_S9wI [AUDIT] CRL uploaded: id=[{"issuerID":56183656833365902,"shardIdx":10,"crlNumber":1720804547385500723}] issuerCN=[int rsa c] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.648832 6 crl-storer n6ir3gY [AUDIT] CRL uploaded: id=[{"issuerID":56183656833365902,"shardIdx":6,"crlNumber":1720804547385500723}] issuerCN=[int rsa c] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.655829 6 crl-storer sark-w0 [AUDIT] CRL uploaded: id=[{"issuerID":56183656833365902,"shardIdx":9,"crlNumber":1720804547385500723}] issuerCN=[int rsa c] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.674235 6 boulder-ra zYKfGQA [AUDIT] Revocation request: JSON={"ID":"Do3KclSVzDAwnlkaJ3y1EZV4tT_p8qWipRm3TIBASp8","SerialNumber":"7f74024064d1a002a6a5049d8225bea17c8e","Reason":5,"Method":"subscriber","RequesterID":78}
17:15:47.680739 6 akamai-purger r_LGxg4 [AUDIT] Purge request sent successfully (ID welcome-to-the-purge) (body {"objects":["http://ca.example.org:4002/?body-md5=0968f15b52e71226","http://ca.example.org:4002/MFMwUTBPME0wSzAJBgUrDgMCGgUABBSZIxhd2TJe07rNYa+OT1Fb9KLytgQUjy9tm6pAtmuhgdKoBL4pcwOH35ACEn90AkBk0aACpqUEnYIlvqF8jg==","http://ca.example.org:4002/MFMwUTBPME0wSzAJBgUrDgMCGgUABBSZIxhd2TJe07rNYa%2BOT1Fb9KLytgQUjy9tm6pAtmuhgdKoBL4pcwOH35ACEn90AkBk0aACpqUEnYIlvqF8jg%3D%3D"]}). Purge expected in 153s
17:15:47.727828 6 crl-updater 9u6M0gg Generating CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547713446378}] numChunks=[1]
17:15:47.728552 6 crl-storer nuCwiAw [AUDIT] CRL uploaded: id=[{"issuerID":6762885421992935,"shardIdx":6,"crlNumber":1720804547693645707}] issuerCN=[int rsa b] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.729154 6 crl-updater 3tP_xQs Queried SA for CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547713446378}] expiresAfter=[2024-09-14 11:04:38 +0000 UTC m=+5507527.655558313] expiresBefore=[2024-09-24 11:04:38 +0000 UTC m=+6371527.655558313] numEntries=[1]
17:15:47.734704 6 crl-storer ltOwHQA [AUDIT] CRL uploaded: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547713446378}] issuerCN=[int rsa a] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[1]
17:15:47.734962 6 crl-updater -NT92Q4 Generated CRL shard: id=[{"issuerID":29947985078257530,"shardIdx":9,"crlNumber":1720804547713446378}] size=[557] hash=[2f861738708e2885c24643f2eb1d1217dcb8e9b55e410ae81dd65e5f5bb687cb]
17:15:47.737068 6 crl-storer 2dCDhQs [AUDIT] CRL uploaded: id=[{"issuerID":6762885421992935,"shardIdx":7,"crlNumber":1720804547693645707}] issuerCN=[int rsa b] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.741680 6 crl-storer kIHWkgI [AUDIT] CRL uploaded: id=[{"issuerID":6762885421992935,"shardIdx":4,"crlNumber":1720804547693645707}] issuerCN=[int rsa b] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]
17:15:47.742767 6 crl-storer 17HlnwU [AUDIT] CRL uploaded: id=[{"issuerID":6762885421992935,"shardIdx":5,"crlNumber":1720804547693645707}] issuerCN=[int rsa b] thisUpdate=[2024-07-12 17:15:47 +0000 UTC] nextUpdate=[2024-07-21 17:15:46 +0000 UTC] numEntries=[0]

What we see here is the following sequence of events: 1) Three log lines (actually many more, but I only kept the last three) from crl-storer, showing it storing some of the CRLs generated by crl_test.go's first run of crl-updater 2) Two log lines from the RA and akamai-purger corresponding to the revocation that crl_test.go does between its two CRL generation cycles 3) Five log lines from crl-updater and crl-storer showing the creation of this single CRL with shardIdx 9 and crlNumber ...77... 4) Three log lines (again many more, but I kept only the first three) from crl-storer, showing it storing some of the CRLs generated by crl_test.go's second run of crl-updater

We know that the log lines in (3) came from the normal instance of crl-updater started by the integration test environment's start.py, because the log lines show up at all. All of the crl-updater log lines created by crl_test.go get captured by the test runner and only printed if something goes wrong (this is why the crl-storer log lines in (1) and (4) aren't interleaved with corresponding crl-updater logs).

So this is just an instance of the integration tests getting unlucky. The crl_test.go integration test raced against the normal semi-random execution of the normal crl-updater, and lost the race.

This does beg the question: why does the CRL uploaded by the normal continuous instance of crl-updater have a slightly larger (i.e. later, since our CRL numbers are just timestamps) number than the one created by crl_test.go's batch process? I think this is simply because there's some delay between the batch process selecting its time and actually doing the work, and there's much less delay in the continuous case. So the continuous process updated one shard in the gap between the batch process picking its time/number and getting around to updating that same shard.

The only real solution I've thought of so far is "don't run crl-updater in continuous mode in the integration test environment". I don't love this idea, because it means that we lose coverage of crl-updater in ./t.sh --start-py, which is legitimately useful. But aside from that, including crl-updater in startservers isn't getting us much, since the only consequence of it going awry is a few log lines, no test failures.

Does anyone have other ideas for how to prevent crl-updater continuous from racing against crl-updater batch in this test?

jsha commented 1 month ago

Your analysis makes sense. The difference between those two timestamps is 20ms, which is on the order of thread scheduling overhead.

The unwritten(?) assumption of crl-updater is that it is never competing with another instance of crl-updater, and we're violating that assumption in our integration tests.

One way to start meeting that assumption would be to split the config into crl-updater-continuous.json and crl-updater-batch.json, and have them be responsible for disjoint sets of issuers.

aarongable commented 1 month ago

We could do that if we had one covering RSA and the other covering ECDSA. Otherwise we don't have a way to guarantee that the cert which crl_test.go issues-and-revokes would be issued by the issuer which crl-updater-batch.json is responsible for.