This was observed in staging where the infra team stopped the pageservers for around 20 minutes. When the pageserver restarted, the reattach response was processed before the heartbeats marked the node active. The heartbeats detected the node coming back online (they store node state separately), but this inhibited the heartbeat handler from re-attaching the tenants (Service::node_activate_reconcile)
This was observed in staging where the infra team stopped the pageservers for around 20 minutes. When the pageserver restarted, the reattach response was processed before the heartbeats marked the node active. The heartbeats detected the node coming back online (they store node state separately), but this inhibited the heartbeat handler from re-attaching the tenants (
Service::node_activate_reconcile
)https://neondb.slack.com/archives/C060CNA47S9/p1718270673517979