neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.78k stars 429 forks source link

Layer deletion operation can be lost in case of a pageserver crash/restart #4326

Closed LizardWizzard closed 5 months ago

LizardWizzard commented 1 year ago

Steps to reproduce

  1. Schedule layer deletion
  2. Upload index_file.json change
  3. Crash before actual deletion is done

Expected result

Layer should be deleted

Actual result

Layer wont be deleted. It is no longer in the index file and there is nothing that detects those. One option to solve it would be to persist delete intention in the index file first, then by looking at the index file after restart deletions can be retried.

Environment

Logs, links

LizardWizzard commented 1 year ago

The same can happen during upload, if we uploaded the layer successfully but index file wasnt updated. Likely after restart there upload operation will be retried, so in this case there is no actual leakage.

shanyp commented 1 year ago

https://github.com/neondatabase/neon/issues/4378