Closed johanbrandhorst closed 23 hours ago
Looks like the test got stuck instead of panicing 😂. I'll see what I can do about it.
Yikes the cmd.Run function doesn't seem to return in some instances, and it doesn't seem to respect context cancelation, so there's no great way to ensure the function exits while the test is running, it seems.
The new logging has gotten us a step closer to figuring out what's going on with these tests. It looks like most of them failed with a timeout this time. Errors are all looking like this:
TestReloadControllerDatabase: got a non-zero exit status: Error starting controller: error registering jobs: vault.RegisterJobs: token renewal job: scheduler.(Scheduler).RegisterJob: job.(Repository).UpsertJob: db.DoTx: unknown, unknown: error #0: dbw.Begin: failed to connect to `user=boundary database=boundary_test_qltqrubshvmfdnux`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_qltqrubshvmfdnux" does not exist (SQLSTATE 3D000)
TestReloadControllerRateLimits: got a non-zero exit status: Error initializing controller: error registering gcp host plugin: error looking up plugin by name: plugin.(Repository).LookupPluginByName: failed for: gcp: db.LookupWhere: unknown, unknown: error #0: dbw.LookupWhere: failed to connect to `user=boundary database=boundary_test_unncgapurajovkiw`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_unncgapurajovkiw" does not exist (SQLSTATE 3D000)
TestReloadControllerRateLimitsDisable: got a non-zero exit status: Error initializing controller: error registering gcp host plugin: error looking up plugin by name: plugin.(Repository).LookupPluginByName: failed for: gcp: db.LookupWhere: unknown, unknown: error #0: dbw.LookupWhere: failed to connect to `user=boundary database=boundary_test_vmokqfhgfaokjgut`: 127.0.0.1:5432 (127.0.0.1): server error: FATAL: database "boundary_test_vmokqfhgfaokjgut" does not exist (SQLSTATE 3D000)
Seems like something went wrong with the database startup?
Changing these to fmt.Printf will stop the "panic: error after test returned", but it won't actually fix the underlying issues. We'll come back to these tests during the test week we have planned.
Previously, if one of these tests failed to start their servers before the timeout, there was a risk that the goroutine started to run the server would invoke t.Errorf after the test itself had concluded. Replace t.Errorf with fmt.Printf to avoid the panic, and include the test name to help debugging.
These panics seem to have started happening recently - I recorded two instances in just the last two days. You can see the panic summary at the end of the test log, but also note the "timeout" message printed in the tests: