neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.28k stars 408 forks source link

Epic: cancellation in long running pageserver tasks #5585

Open jcsp opened 10 months ago

jcsp commented 10 months ago

This is an umbrella ticket for all the places may not be properly respecting cancellation in long running tasks

### Tasks
- [ ] Drop out of waiting for semaphore for remote storage
- [ ] Any loops inside GC/compaction
- [ ] https://github.com/neondatabase/neon/issues/5066
- [ ] Automated test that stresses tenant detach/deletion while background work is going on, asserting that operations complete promptly even if background work is slow.
- [ ] https://github.com/neondatabase/neon/issues/6096
koivunej commented 10 months ago

Related investigation: https://app.incident.io/neondb/incidents/73

Investigation related reopening: #5341 and re-closing #5696.

koivunej commented 6 months ago

7051 fixed on-demand download working off wrong cancellation token.

koivunej commented 4 months ago

Lots of cancellation token checks have been thrown around, now our problem is not so much the cancellation, but the bad errors.

jcsp commented 4 months ago

What's next for this issue? Let's make it specific enough to take action on, or close it.