We're seeing errors on safekeeper-4 in us-east-2, and they are localized only on this safekeeper.
2024-01-23T15:12:42.309924Z ERROR {cid=12138 ttid=XXX/YYY}:WAL sender: terminated: Other(Failed to open WAL segment download stream for remote path RemotePath("XXX/YYY/000000010000000000000003")
Caused by:
No file found for the remote object id given
Stack backtrace:
safekeeper-4 is the only safekeeper that has timelines with local_start_lsn != timeline_start_lsn. This is quite possibly causing the issues here, as this error can happen if client (pageserver) requests WAL from the same segment as where local_start_lsn is located, but before local_start_lsn itself. The logic in safekeepers prevents reading uninitialized WAL and safekeeper is trying to read WAL from remote storage, but it can be unavailable if this segments wasn't uploaded yet.
We're seeing errors on safekeeper-4 in us-east-2, and they are localized only on this safekeeper.
safekeeper-4
is the only safekeeper that has timelines withlocal_start_lsn != timeline_start_lsn
. This is quite possibly causing the issues here, as this error can happen if client (pageserver) requests WAL from the same segment as wherelocal_start_lsn
is located, but beforelocal_start_lsn
itself. The logic in safekeepers prevents reading uninitialized WAL and safekeeper is trying to read WAL from remote storage, but it can be unavailable if this segments wasn't uploaded yet.The plan is:
local_start_lsn != timeline_start_lsn
local_start_lsn
Related slack threads: