neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.78k stars 430 forks source link

test_crafted_wal_end flakyness #4691

Closed koivunej closed 6 months ago

koivunej commented 1 year ago

The test_crafted_wal_end has been flaky in a number of ways. This issue tracks all findings.

test_crafted_wal_end[release-pg15-last_wal_record_xlog_switch_ends_on_page_boundary]: release

koivunej commented 1 year ago

test_crafted_wal_end[debug-pg15-last_wal_record_crossing_segment]: debug

koivunej commented 1 year ago

test_crafted_wal_end[debug-pg15-wal_record_crossing_segment_followed_by_small_one]: debug

jcsp commented 1 year ago

https://github.com/neondatabase/neon/actions/runs/5752601432/job/15594169701?pr=4890

 FAILED test_runner/regress/test_crafted_wal_end.py::test_crafted_wal_end[release-pg15-last_wal_record_xlog_switch_ends_on_page_boundary] - RuntimeError:             Run ['/tmp/neon/bin/wal_craft', 'in-existing', 'last_wal_record_xlog_switch_ends_on_page_boundary', "host=localhost port=29015 user=cloud_admin dbname=postgres options='-cstatement_timeout=120s '"] failed:
              stdout: 
              stderr: [2023-08-03T15:27:26Z INFO  wal_craft] current_wal_insert_lsn=0/14F51F8, remaining_lsn=11784, base_wal_advance=8368, repeats=3426
awestover commented 1 year ago

could this be the same not flushing xlog problem as from https://github.com/neondatabase/neon/issues/559 ?

awestover commented 1 year ago

John's Allure report had this in it:

2023-08-03T17:41:47.279783Z  INFO http request{otel.name=/extension_server/neon_test_utils http.method=POST}: serving /extension_server POST request, filename: "neon_test_utils" is_library: false
2023-08-03T17:41:47.279892Z ERROR http request{otel.name=/extension_server/neon_test_utils http.method=POST}: extension download failed: No remote extension storage

really hope it's unrelated to the test failure but if it was, I think I wrote a patch to fix it

jcsp commented 5 months ago

@arssher can you look at recent failures of this test and see if it is the same issue as this ticket?

arssher commented 5 months ago

Some are https://neondb.slack.com/archives/C033RQ5SPDH/p1714652474499819 and some are https://github.com/neondatabase/neon/pull/7588

arssher commented 5 months ago

https://github.com/neondatabase/neon/pull/7592 is for the slack thread in the previous comment