habitat-sh / habitat

Modern applications with built-in automation
https://www.habitat.sh
Apache License 2.0
2.6k stars 314 forks source link

Normalize lock ordering to avoid deadlocks #6825

Closed baumanj closed 3 years ago

baumanj commented 5 years ago

There are many locking primitives in the habitat codebase, and though safe rust guarantees the impossibility of data races, it is possible to deadlock unless a consistent scheme is employed to ensure safe access to lock-guarded resources. See https://en.wikipedia.org/wiki/Dining_philosophers_problem for an overview of this concept.

The simplest approach to address this in habitat is likely to be lock ordering. If all the locks in the system can be partially ordered and every thread which acquires multiple locks concurrently acquires them in concordance with the partial order and releases them in the opposite order, this should be sufficient to avoid deadlock.

Note that this issue is somewhat related to but distinct from https://github.com/habitat-sh/habitat/issues/6435. That issue relates to deadlock due to recursive lock acquisition. That is, when a thread holds a lock and then acquires it again. Some locking primitives support this, but supporting recursive locking is mutually exclusive with fairness, which we want to support. Also, designs which depend on recursive locking tend to be excessively complex and hard to reason about.

This work will require locating the various locks in the system, auditing their usage to derive a partial ordering, and fixing any logic which is inconsistent with the ordering. Additionally, it may be advantageous to implement a mechanism for ordering enforcement, either at runtime or compile time to ensure against regressions, but that decision is an implementation detail.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

stale[bot] commented 3 years ago

This issue has been automatically closed after being stale for 400 days. We still value your input and contribution. Please re-open the issue if desired and leave a comment with details.