Closed sharnoff closed 8 months ago
Related PRs and discussions:
## Tasks - [ ] https://github.com/neondatabase/autoscaling/pull/773 - [ ] https://github.com/neondatabase/autoscaling/pull/779 - [ ] https://github.com/neondatabase/autoscaling/pull/783 - [ ] ~~Alerting for reconcile workers saturation~~ - [ ] ~~Alerting for reconcile error rate~~ - [x] Alerting for (a) many objects failing to reconcile, or (b) extended period of object(s) failing to reconcile - [ ] Alerting for p90 workqueue wait duration - [ ] Investigate why increasing max reconcile workers [decreases p50-p90 reconcile durations](https://neondb.slack.com/archives/C03TN5G758R/p1706652954423079?thread_ts=1706160071.213319&cid=C03TN5G758R) - [ ] Consistent baseline of reconcile operations taking 1s (probably related to sleeps during memory unplug?)
Some things have already been done. The remaining items are mostly handled by the alerting referenced here: https://github.com/neondatabase/cloud/issues/9629#issuecomment-1938070183
Related PRs and discussions: