lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.56k stars 2.06k forks source link

[bug]: stale commitment fee for rarely active channels #8790

Open rkfg opened 1 month ago

rkfg commented 1 month ago

Background

Commitment fee (CFee) may get very stale after a while for the private (unpublished) channels if the peer isn't online for a while. I have a few such private channels to my own various wallets and also a channel to my friend. He only uses it rarely to receive payments and exits the app right after that. I initiated this channel so only I can update the CFee. However, it was once set to 1 sat/vB and never updated since. This issue applies mostly to the private channels because they're assumed to be inactive most of the time. Public channels are used for routing and must always be active so the update will happen sooner or later.

I did a little research and it seems that CFee only updates (links point to the relevant source line) when a channel timer elapses but never for any other reason. The timer has a random period between 10 and 60 minutes, it can not be changed by the user I assume, couldn't find any relevant code and config parameters. So the clients that don't stay online for quite a long while (up to 1 hour) will never receive a fee update. They may get stuck with a very low fee until they have to FC (due to their LSP losing the state/shutting down/the LSP decides to FC instead) only to find out that their FC tx can't even enter the mempool because it's below the purge level.

AFAIK no apps show the current CFee so the problem would be quite hidden. I see it in lntop and I can also check it with lncli listchannels of course. But the mobile clients either have it buried in the dev settings or don't expose it at all. Asking the peer to keep their phone on with the app active for up to 1 hour (I don't know what period the timer has, it's random) so that their balance can actually be resolved on chain if things go south, is an extremely bad UX. Also CFee can not be forcibly updated by the lnd user. The only way is that single timer, I didn't find any other lines that call this function.

Your environment

Steps to reproduce

1) Open a private channel, can be public as well, it shouldn't matter; for mobile apps I usually open a private one 2) Wait for the fees to be low enough, keep the app running so the CFee is updated 3) Exit the app, wait for the fees to rise 4) Open the app, do any payments, close it in under 10 minutes 5) Keep doing it from time to time and your CFee will stay low

Expected behaviour

I have two ideas: 1) When a channel becomes active check if more than DefaultMaxLinkFeeUpdateTimeout passed since the last fee update. If yes, do a fee update. 2) Also do it after every settled HTLC.

Now, both ideas assume that we can (we're the initiator) and should (fees changed by a lot) do a fee update. Otherwise it's a NOP.

yyforyongyu commented 1 month ago
  1. Exit the app, wait for the fees to rise
  2. Open the app, do any payments, close it in under 10 minutes

I think what we should do is perform a fee update at step 4 - so whenever a channel becomes active we send a fee update.

it can not be changed by the user I assume, couldn't find any relevant code and config parameters.

Yea this cannot be changed.

Roasbeef commented 1 month ago

I think what we should do is perform a fee update at step 4 - so whenever a channel becomes active we send a fee update.

+1 for this. Similar to the changes where the sweeper always tries to sweep on start up.

rkfg commented 1 month ago

Would this patch do? I'm a bit scared to roll it out on my main instance even though it looks trivial enough.

diff --git a/htlcswitch/link.go b/htlcswitch/link.go
index c0c828f83..1d3b84994 100644
--- a/htlcswitch/link.go
+++ b/htlcswitch/link.go
@@ -1072,6 +1072,50 @@ func (l *channelLink) loadAndRemove() error {
    return l.channel.RemoveFwdPkgs(removeHeights...)
 }

+func (l *channelLink) maybeUpdateFee() {
+   if !l.channel.IsInitiator() {
+       return
+   }
+
+   // If we are the initiator, then we'll sample the
+   // current fee rate to get into the chain within 3
+   // blocks.
+   netFee, err := l.sampleNetworkFee()
+   if err != nil {
+       l.log.Errorf("unable to sample network fee: %v",
+           err)
+       return
+   }
+
+   minRelayFee := l.cfg.FeeEstimator.RelayFeePerKW()
+
+   newCommitFee := l.channel.IdealCommitFeeRate(
+       netFee, minRelayFee,
+       l.cfg.MaxAnchorsCommitFeeRate,
+       l.cfg.MaxFeeAllocation,
+   )
+
+   // We determine if we should adjust the commitment fee
+   // based on the current commitment fee, the suggested
+   // new commitment fee and the current minimum relay fee
+   // rate.
+   commitFee := l.channel.CommitFeeRate()
+   if !shouldAdjustCommitFee(
+       newCommitFee, commitFee, minRelayFee,
+   ) {
+
+       return
+   }
+
+   // If we do, then we'll send a new UpdateFee message to
+   // the remote party, to be locked in with a new update.
+   if err := l.updateChannelFee(newCommitFee); err != nil {
+       l.log.Errorf("unable to update fee rate: %v",
+           err)
+       return
+   }
+}
+
 // htlcManager is the primary goroutine which drives a channel's commitment
 // update state-machine in response to messages received via several channels.
 // This goroutine reads messages from the upstream (remote) peer, and also from
@@ -1263,7 +1307,7 @@ func (l *channelLink) htlcManager() {
        l.wg.Add(1)
        go l.fwdPkgGarbager()
    }
-
+   l.maybeUpdateFee()
    for {
        // We must always check if we failed at some point processing
        // the last update before processing the next.
@@ -1322,47 +1366,7 @@ func (l *channelLink) htlcManager() {

            // If we're not the initiator of the channel, don't we
            // don't control the fees, so we can ignore this.
-           if !l.channel.IsInitiator() {
-               continue
-           }
-
-           // If we are the initiator, then we'll sample the
-           // current fee rate to get into the chain within 3
-           // blocks.
-           netFee, err := l.sampleNetworkFee()
-           if err != nil {
-               l.log.Errorf("unable to sample network fee: %v",
-                   err)
-               continue
-           }
-
-           minRelayFee := l.cfg.FeeEstimator.RelayFeePerKW()
-
-           newCommitFee := l.channel.IdealCommitFeeRate(
-               netFee, minRelayFee,
-               l.cfg.MaxAnchorsCommitFeeRate,
-               l.cfg.MaxFeeAllocation,
-           )
-
-           // We determine if we should adjust the commitment fee
-           // based on the current commitment fee, the suggested
-           // new commitment fee and the current minimum relay fee
-           // rate.
-           commitFee := l.channel.CommitFeeRate()
-           if !shouldAdjustCommitFee(
-               newCommitFee, commitFee, minRelayFee,
-           ) {
-
-               continue
-           }
-
-           // If we do, then we'll send a new UpdateFee message to
-           // the remote party, to be locked in with a new update.
-           if err := l.updateChannelFee(newCommitFee); err != nil {
-               l.log.Errorf("unable to update fee rate: %v",
-                   err)
-               continue
-           }
+           l.maybeUpdateFee()

        // The underlying channel has notified us of a unilateral close
        // carried out by the remote peer. In the case of such an

Just want to make sure I wouldn't cause mass FC with this because I don't know the internals.

yyforyongyu commented 4 weeks ago

@rkfg Could you open a PR based on this patch? It's easier to tell that way as the CI will tell.

rkfg commented 3 weeks ago

PR opened. I ran tests myself but they were inconclusive and failed on one machine and simply froze indefinitely on another. Let's see if the CI works.