[feature]: Preventing cascading force closes

ziggie1984 commented 1 month ago

In this issue issue I want to discuss potential approaches how to prevent potential cascading force closes on the lightning network and in particular LND.

So the general idea is to find a balance between the economic cost and the potential risk in doing so.

https://github.com/lightningnetwork/lnd/pull/9068 starts to cancel back dust before the commitment is confirmed because dust cannot be enforced anyways so this is purely a advantage to do, there is no more risk in doing so.
The second idea of mine is to cancel also non-dust htlcs. Here we are definitely in risk of loosing the amount if our peer decides to sweep the outgoing htlc with the preimage and we already have canceled back the incoming. So we need to be more specific how this should be done. In addition to this, force closing the incoming gives an attacker additional cost and also the lock-up of funds for the predefined csv period.

First question to ask: Is there any difference between waiting of the commitment transaction to be confirmed or even considering canceling non-dust back when the commitment is still unconfirmed? I think we should wait until the commitment is confirmed because only then for anchor channels will we be able to see whether the remote party can sweep the preimage path (csv-lock of 1).
So if we have common ground that we wait for the commitment tx to confirm, to cancel back non-dust we should now define rules when we should do this:

I think first we should only cancel the incoming back at almost the deadline limit (maybe +5 blocks) of the incoming timeout. There is no need to act immediately.

We should also only cancel back non-dust up to a certain amount: Here we should try to sort the outgoing htlcs according to their incoming channel so that we cancel back at least on the same channel to make it more likely to prevent FCs on the incoming links.

We should only attempt to cancel back incoming htlcs if we can be sure we at least tried to sweep the outgoing (enough wallet inputs available) and it is still unconfiirmed in the mempool, otherwise this might be too easy to exploit?

The third idea is to start not force-closing the outgoing channel due to dust htlcs, but this is only possible if the peer cancels the dust htlcs after the timeout passed, otherwise this would be exploitable, we need to make sure we force-close the channel as soon as our peer is breaking the rules (sending the preimage even after the timeout period). Maybe we should also only do this up to a certain amount so that its a balance between security and economic win.
We can probably also start non force-closing the outgoing for non-dust htlcs up to a certain amount. Maybe this should be done channel-specific, or even htlc specifc via an HTLC interceptor where the noderunner can decide whether he wants to force-close due to this htlc?

At the end these are all trade-offs but I want to kick of the discussion because the next mempool feespike might be around the corner and then those topics matter even more.

ziggie1984 commented 1 month ago

Related to 2-4:

I think another way of approaching this issue is to just give the node-runner the option to cancel a particular incoming htlc if the outgoing HTLC is already in the timeout period => so that the user does not have too much power. And also give the user the ability to block force-closing an outgoing channel, this would then carnitine the outgoing htlc preventing it from resolving via the preimage path. So in case we are already passed timeout and all of a sudden the peer tries to resolve it with a preimage with force close the outgoing.

To sum it up, I think the user can make way more economical decisions (loosing inbound, current chain fees) rather than LND deciding it for every user?

thanks @feelancer21 for the discussion.

feelancer21 commented 1 month ago

To sum it up, I think the user can make way more economical decisions (loosing inbound, current chain fees) rather than LND deciding it for every user?

Small addition: What also speaks in favor of leaving it to the user is that he can then also decide in the context of a cancelback how he wants to deal with this channel and possibly other channels of the peer for the time being. E.g. blocking further htlcs, not accepting new channels etc.

morehouse commented 4 weeks ago

SGTM.
Seems reasonable. The user could specify a cancel-back threshold in their config, beneath which upstream HTLCs would be cancelled on downstream force close. Waiting until the deadline approaches seems best to me (which we could also do for (1) if we're writing the logic anyway).

3 & 4 seem sketchy. The implementation sounds complex, and I'm not sure we should be jumping through hoops to keep a channel open when the peer has been unreliable lately.

yyforyongyu commented 4 weeks ago

1 and 2 yes, with a new config and a sane default which limits how much we could cancel. Not in favor of 3 and 4 tho as it violate the protocol, plus complexity, plus I think we should push for the direction of making channels more stable than enabling them to function still even if they are faulty.

Re 3 & 4, we could add a new whitelist config that, when set, the channels with them will never force close, given both nodes are managed by the same party.

feelancer21 commented 3 weeks ago

Re 3 & 4, we could add a new whitelist config that, when set, the channels with them will never force close, given both nodes are managed by the same party.

Not sure. But economically, in my opinion, a decision will always be made in favor of an HTLC and not an entire node. If in doubt, we should leave features 3 and 4 alone for the time being.

lightningnetwork / lnd

[feature]: Preventing cascading force closes #9128