Closed jcsp closed 9 months ago
I think this bug is being exposed now because the control plane used to call /attach in this case, and would have got an error (attach is not idempotent) because of the already-attached tenant. Now the location_conf API is correctly trying to shut down the original Tenant and create a new one, so we're hitting some bug in the shutdown path.
Diagnosed the hang: Tenant::shutdown calls set_stopping with allow_transition_from_attaching=false, the tenant is left in attaching state by Tenant::spawn when it sees cancellation token while waiting for the concurrent_tenant_warmup
.
In quick succession:
So: some piece of code that holds a TenantSlotGuard is getting stuck.
This is likely related to one or both of:
Backref: https://neondb.slack.com/archives/C03F5SM1N02/p1705915089864759?thread_ts=1705847167.713309&cid=C03F5SM1N02