neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.48k stars 419 forks source link

pageserver: abort process on fsync errors #8140

Closed jcsp closed 1 day ago

jcsp commented 3 months ago

We added the infrastructure for this a while back (maybe_fatal_err etc), but there's lots of code that doesn't use it.

We hit these kinds of errors when an EC2 instance is dying.

We should be aborting the process on I/O errors when writing to layer files.

skyzh commented 1 month ago

Write path done, need to investigate read path.

skyzh commented 5 days ago

christian: panic when we we run out of place; disk-based eviction? tenant config (?) fsync after rename

final thoughts: panic on fsync failure, discuss no space issue later