tests: use semaphore instead of lock for Endpoint.running

neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

Apache License 2.0

13.2k stars 367 forks source link

Problem

Ahem, let's try this again.

https://github.com/neondatabase/neon/pull/8110 had a spooky failure in test_multi_attach where a call to Endpoint.stop() timed out waiting for a lock, even though we can see an earlier call completing and releasing the lock. I suspect something weird is going on with the way pytest runs tests across processes, or use of asyncio perhaps.

Anyway: the simplest fix is to just use a semaphore instead: if we don't lock we can't deadlock.

Summary of changes

Make Endpoint.running a semaphore, where we add a unit to its counter when starting the process and atomically decrement it when stopping.

Checklist before requesting a review

[ ] I have performed a self-review of my code.
[ ] If it is a core feature, I have added thorough tests.
[ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
[ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

[ ] Do not forget to reformat commit message to not include the above checklist

neondatabase / neon