Closed save-buffer closed 3 weeks ago
I feel a quick fix is to allow the compute startup process to produce a warning in this case instead of erroring so that at least users can start their compute.
Yes in principle I also agree, but since we have some synchronization between cplane and compute, if we ignore these errors we can end up in an inconsistent state. We should probably just make the compute be the source of truth, and have some reconciliation back into the cplane if something fails. Seems like a fairly large-scope project
There's also the risk that if it generates warnings instead of causing an incident, we'll be lazy and just never fix it. Or put more diplomatically, it won't be "high priority" and we'll constantly have other, higher-priority things to fix. So not sure what the right call is
Without the context it's hard to tell, but this is likely the duplicate of https://github.com/neondatabase/cloud/issues/13582
Ah nice, yes seems like a duplicate
Closing as a duplicate of https://github.com/neondatabase/cloud/issues/13582
Steps to reproduce
Needs investigation, but somehow reassignment didn't work in this incident https://neondb.slack.com/archives/C07BB3NHUUX/p1720584945416209
Expected result
Compute startup should never fail due to bad role drop
Actual result
Sometimes it does
Environment
Logs, links