neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.6k stars 423 forks source link

Partial backup doesn't work when LSN is exactly at WAL segment start #7987

Open petuhovskiy opened 4 months ago

petuhovskiy commented 4 months ago

Found errors like this in the logs:

starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000002_103_0000000002000000_0000000002000000_sk347.partial", commit_lsn: 0/2000000, flush_lsn: 0/2000000, term: 103 }
failed to upload 000000010000000000000002_103_0000000002000000_0000000002000000_sk347.partial: Failed to open file "/storage/safekeeper/data/XXX/YYY/000000010000000000000002.partial" for wal backup: No such file or directory (os error 2)

The fix is to skip upload in such cases, when LSN offset in segment is exactly zero. Partial backup doesn't work in this case because WAL removal deletes old segment before uploader can read it.

Context: https://neondb.slack.com/archives/C0706FMFRJ7/p1717698384241649?thread_ts=1717549440.269089&cid=C0706FMFRJ7

jcsp commented 3 months ago

Impact: generates errors in logs, but no other impact because what we would have uploaded would have been empty.