Open jcsp opened 4 months ago
Is there a chance you could link an Allure report to a run that failed?
(I slacked a link to a wiki page that lets you directly fetch recent failures)
This has failed 14 times in past 48h -- @save-buffer any progress stabilizing it?
Should be fixed by #6976
Test is still failing frequently over the last 3 days since #6688 merged
Another failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7079/8233321973/index.html#/testresult/8c4d8d2b95cdcb4a
In the compute logs I see:
2024-03-11T13:22:56.193741Z ERROR error while post_apply_config: handle_neon_extension_upgrade: connection closed
PG:2024-03-11 13:22:56.174 GMT [7830] LOG: [NEON_SMGR] [shard 0] libpagestore: connected to 'postgresql://no_user@localhost:30544'
PG:2024-03-11 13:22:56.304 GMT [7816] LOG: server process (PID 7830) was terminated by signal 11: Segmentation fault
PG:2024-03-11 13:22:56.304 GMT [7816] DETAIL: Failed process was running: ALTER EXTENSION neon UPDATE
PG:2024-03-11 13:22:56.304 GMT [7816] LOG: terminating any other active server processes
PG:2024-03-11 13:22:56.305 GMT [7816] LOG: shutting down because restart_after_crash is off
So it looks like this test is good at finding bugs, but our postgres code is not solid enough yet to survive unusual compute<->ps connection breaks.
https://github.com/neondatabase/neon/pull/7095 Here's another bug I discovered, maybe it'll help stabilize it
Ok https://github.com/neondatabase/neon/pull/7095 is merged, let's keep an eye on the test, and if it stops failing as much we can close the issue again
4 failures in last 3 days, so there's still work to do here.
Opened #7281
@save-buffer still flaky
This is a test the injects page_service request failures.
Occasionally it fails in compute startup, failing to get basebackup.
I thought https://github.com/neondatabase/neon/pull/6537 would fix this, but it appears it hasn't, so the question is: are the new retries not working, or is this test somehow failing differently?