letsencrypt / boulder

An ACME-based certificate authority, written in Go.
Mozilla Public License 2.0
5.22k stars 607 forks source link

Race between TestAdminClearEmail and TestIssuanceCertStorageFailed #7838

Open pgporada opened 4 days ago

pgporada commented 4 days ago

Randomly the following error will be thrown in CI.

--- FAIL: TestAdminClearEmail (0.04s)
    admin_test.go:43: clearing email via admin tool (./bin/admin -config test/config/admin.json -dry-run=false update-email -address example@mail.example.letsencrypt.org -clear): 01:37:56.655147 6 admin 8MWbnQQ Debug server listening on :8014
        01:37:56.655258 6 admin p6yD7gM Versions: admin=(Unspecified Unspecified) Golang=(go1.23.1) BuildHost=(Unspecified)
        01:37:56.655406 3 admin 9siVpw4 [AUDIT] unable to boot debug server on :8014: listen tcp :8014: bind: address already in use
        : exit status 1
FAIL

TestAdminClearEmail and TestIssuanceCertStorageFailed both exist in package integration and are run in parallel. I think golang runs all t.Parallel() at the package level up to the maximum number of tests allowed to run simultaneously.

Each test is using the same admin.json which sets its debug port to 8014/tcp.

user at work in ~/work/boulder/test/integration on main*
$ grep -ri admin
...
admin_test.go:  config := fmt.Sprintf("%s/%s", os.Getenv("BOULDER_CONFIG_DIR"), "admin.json")
...
cert_storage_failed_test.go:    config := fmt.Sprintf("%s/%s", os.Getenv("BOULDER_CONFIG_DIR"), "admin.json")
...

There's a race between these tests starting up the process and attempting to claim the port.

We could potentially pass --debug-addr ${IP}:${RANDOM_PORT} to each startup to bypass this race.