neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
144 stars 18 forks source link

neonvm: metrics for failing reconcile #920

Closed Omrigan closed 2 months ago

Omrigan commented 2 months ago

The need for this appeared during the investigation of an elevated level of reconciliation failures after neonvm-controllers were moved to an another node group.

Adds a single metric reconcile_duration_seconds with an outcome label

+ relevant log messages

Part of a #916 and #918. Easier to do together because of conflicts.

sharnoff commented 2 months ago

I don't think this would resolve #918 — AFAIK most of the conflicts we get aren't actually at the final "update status" step, but earlier on in the reconcile operation. We'd need to inspect the error value we get.

Omrigan commented 2 months ago

I don't think this would resolve #918 — AFAIK most of the conflicts we get aren't actually at the final "update status" step, but earlier on in the reconcile operation. We'd need to inspect the error value we get.

Would 00e658d work?