neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
15.12k stars 440 forks source link

`test_wal_restore_initdb` is flaky #6263

Open petuhovskiy opened 10 months ago

petuhovskiy commented 10 months ago
+ PG_BIN=/tmp/neon/pg_install/v16/bin
+ WAL_PATH='/tmp/test_output/test_wal_restore_initdb[release-pg16]/repo/safekeepers/sk1/956d95578c264511d8d0eada890c9991/cf5ea991f6ff2aeec1a9d4911b3aa9fb'
+ DATA_DIR='/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored'
+ PORT=30759
+ echo port=30759
+ echo 'shared_preload_libraries='\''$libdir/neon_rmgr.so'\'''
++ /tmp/neon/pg_install/v16/bin/pg_controldata -D '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored'
++ grep -F 'REDO location'
++ cut -c 42-
+ REDO_POS=0x4EE9E8
+ declare -i WAL_SIZE=0x4EE9E8+114
+ /tmp/neon/pg_install/v16/bin/pg_ctl -D '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored' -l '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/logfile.log' start
+ /tmp/neon/pg_install/v16/bin/pg_ctl -D '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored' -l '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/logfile.log' stop -m immediate
+ cp '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/pg_wal/000000010000000000000001' .
+ cp '/tmp/test_output/test_wal_restore_initdb[release-pg16]/repo/safekeepers/sk1/956d95578c264511d8d0eada890c9991/cf5ea991f6ff2aeec1a9d4911b3aa9fb/000000010000000000000001' '/tmp/test_output/test_wal_restore_initdb[release-pg16]/repo/safekeepers/sk1/956d95578c264511d8d0eada890c9991/cf5ea991f6ff2aeec1a9d4911b3aa9fb/000000010000000000000002.partial' '/tmp/test_output/test_wal_restore_initdb[release-pg16]/repo/safekeepers/sk1/956d95578c264511d8d0eada890c9991/cf5ea991f6ff2aeec1a9d4911b3aa9fb/safekeeper.control' '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/pg_wal/'
+ for partial in "$DATA_DIR"/pg_wal/*.partial
+ mv '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/pg_wal/000000010000000000000002.partial' '/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/pg_wal/000000010000000000000002'
+ dd if=000000010000000000000001 'of=/tmp/test_output/test_wal_restore_initdb[release-pg16]/pgsql.restored/pg_wal/000000010000000000000001' bs=5171802 count=1 conv=notrunc
dd: failed to open '000000010000000000000001': No such file or directory

Test probably assumes that 000000010000000000000001 is never deleted from disk and that may have changed recently with https://github.com/neondatabase/neon/pull/5948

Logs, links

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6091/7397691118/index.html#suites/7ac407ba57d143e0d2d17655a1a207c1/39a8abc179f43fb6

arpad-m commented 10 months ago

I added the test in #5390 as a copy of test_wal_restore mostly. I wonder, is test_wal_restore also flaky due to https://github.com/neondatabase/neon/pull/5948?

alexanderlaw commented 1 day ago

Please take a look at https://github.com/neondatabase/neon/issues/7750#issuecomment-2466706297 I suppose, this is the same issue.