hashicorp / boundary

Boundary enables identity-based access management for dynamic infrastructure.
https://boundaryproject.io
Other
3.86k stars 289 forks source link

internal/commands/server: prevent test panic on timeout #5249

Closed johanbrandhorst closed 23 hours ago

johanbrandhorst commented 1 week ago

Previously, if one of these tests failed to start their servers before the timeout, there was a risk that the goroutine started to run the server would invoke t.Errorf after the test itself had concluded. Replace t.Errorf with fmt.Printf to avoid the panic, and include the test name to help debugging.

These panics seem to have started happening recently - I recorded two instances in just the last two days. You can see the panic summary at the end of the test log, but also note the "timeout" message printed in the tests:

  1. https://github.com/hashicorp/boundary/actions/runs/11846826652/job/33015283733?pr=5242#step:9:29792
  2. https://github.com/hashicorp/boundary/actions/runs/11847490310/job/33017292501?pr=5245#step:9:12181
johanbrandhorst commented 1 week ago

Looks like the test got stuck instead of panicing 😂. I'll see what I can do about it.

johanbrandhorst commented 1 week ago

Yikes the cmd.Run function doesn't seem to return in some instances, and it doesn't seem to respect context cancelation, so there's no great way to ensure the function exits while the test is running, it seems.

johanbrandhorst commented 4 days ago

The new logging has gotten us a step closer to figuring out what's going on with these tests. It looks like most of them failed with a timeout this time. Errors are all looking like this:

TestReloadControllerDatabase: got a non-zero exit status: Error starting controller: error registering jobs: vault.RegisterJobs: token renewal job: scheduler.(Scheduler).RegisterJob: job.(Repository).UpsertJob: db.DoTx: unknown, unknown: error #0: dbw.Begin: failed to connect to `user=boundary database=boundary_test_qltqrubshvmfdnux`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_qltqrubshvmfdnux" does not exist (SQLSTATE 3D000)

1

TestReloadControllerRateLimits: got a non-zero exit status: Error initializing controller: error registering gcp host plugin: error looking up plugin by name: plugin.(Repository).LookupPluginByName: failed for: gcp: db.LookupWhere: unknown, unknown: error #0: dbw.LookupWhere: failed to connect to `user=boundary database=boundary_test_unncgapurajovkiw`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_unncgapurajovkiw" does not exist (SQLSTATE 3D000)

2

 TestReloadControllerRateLimitsDisable: got a non-zero exit status: Error initializing controller: error registering gcp host plugin: error looking up plugin by name: plugin.(Repository).LookupPluginByName: failed for: gcp: db.LookupWhere: unknown, unknown: error #0: dbw.LookupWhere: failed to connect to `user=boundary database=boundary_test_vmokqfhgfaokjgut`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_vmokqfhgfaokjgut" does not exist (SQLSTATE 3D000)

3

Seems like something went wrong with the database startup?

johanbrandhorst commented 23 hours ago

Changing these to fmt.Printf will stop the "panic: error after test returned", but it won't actually fix the underlying issues. We'll come back to these tests during the test week we have planned.