Potential deadlock between RequestRoute and LightningChannel commitment state machine

Discussed in https://github.com/lightningnetwork/lnd/discussions/9060

^{Originally posted by **bjohnson5** September 3, 2024} I believe I have discovered a potential deadlock situation, but I am relatively new to LND and wanted to discuss it before opening an issue, to be sure that I am not missing something. This is when using the bbolt backend database. In `lnwallet/channel.go` the `LightningChannel` struct defines several methods that the comments explain as the "state machine which corresponds to the current commitment protocol wire spec". These methods are: `SignNextCommitment`, `ReceiveNewCommitment`, `RevokeCurrentCommitment`, and `ReceiveRevocation`. Each of these will first lock the `LightningChannel`: `lc.lock()` and then they will typically attempt to update the channel db. When updating the channel db, sometimes the database must be re-sized and re-mapped to memory using the `mmap` function in bbolt's `db.go` file. This function first attempts to lock the `mmaplock` mutex. This is all fine except that if one of the state machine functions is called while the node is trying to find a route a deadlock could occur. The `RequestRoute` function in `payment_session.go` will get a routing graph from the db and this will acquire the `mmaplock` on the db (for good reason, it needs to be sure the db is not re-mapped while it is using it to find a route). It will eventually call functions of the `LightningChannel` struct in order to find bandwidth, balances, etc... It is possible that these functions are locked by one of the state machine methods and that state machine method could be stuck waiting on the `mmaplock`. For example: **Thread1:** RequestRoute -> NewGraphSession() -> `mmaplock.lock()` ----------------------------------> p.pathFinder -> availableChanBandwidth -> *attempts to call LC functions, blocks waiting on `lc.lock()`* **Thread2:** ReceiveRevocation -> `lc.lock()` -----------------------------------------> AdvanceCommitChainTail -> *attempts to update db, blocks waiting on `mmaplock.lock()`* If anyone has experience in this area, please let me know if this is all correct and if I should open an issue. Thanks.

lightningnetwork / lnd

Potential deadlock between RequestRoute and LightningChannel commitment state machine #9133

Discussed in https://github.com/lightningnetwork/lnd/discussions/9060