https://github.com/neondatabase/neon/pull/8110 had a spooky failure in test_multi_attach where a call to Endpoint.stop() timed out waiting for a lock, even though we can see an earlier call completing and releasing the lock. I suspect something weird is going on with the way pytest runs tests across processes, or use of asyncio perhaps.
Anyway: the simplest fix is to just use a semaphore instead: if we don't lock we can't deadlock.
Summary of changes
Make Endpoint.running a semaphore, where we add a unit to its counter when starting the process and atomically decrement it when stopping.
Checklist before requesting a review
[ ] I have performed a self-review of my code.
[ ] If it is a core feature, I have added thorough tests.
[ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
[ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
Checklist before merging
[ ] Do not forget to reformat commit message to not include the above checklist
Problem
Ahem, let's try this again.
https://github.com/neondatabase/neon/pull/8110 had a spooky failure in test_multi_attach where a call to Endpoint.stop() timed out waiting for a lock, even though we can see an earlier call completing and releasing the lock. I suspect something weird is going on with the way pytest runs tests across processes, or use of asyncio perhaps.
Anyway: the simplest fix is to just use a semaphore instead: if we don't lock we can't deadlock.
Summary of changes
Checklist before requesting a review
Checklist before merging