[bug]: On restart LND attempts to broadcast different FC tx for already closing channel, shuts down

kaloudis commented 2 weeks ago

Background

I experienced a peculiar issue with LND v0.18 where after I restarted the node, LND attempts to rebroadcast a force close TX for a channel that is already in the process of being force closed, using a different tx.

Channel point in question is redacted.

I was able to resume operations with use of abandonchannel. chantools assisted in recovery of funds in question.

Potentially related to v0.18 sweeper changes?


2024-06-15 21:33:31.953 [ERR] BTCN: Broadcast attempt failed: insufficient fee
2024-06-15 21:33:31.954 [WRN] BTWL: Transaction d2c98b92e873338eabd024056e39ec40980973cdefa12052ea62a5700551e136 not accepted by mempool: insufficient fee

....

2024-06-15 21:33:31.475 [ERR] CNCT: ChannelArbitrator(xxxxxxxxxxxxxxxxxx:1): unable to broadcast close tx: insufficient fee
2024-06-15 21:33:31.475 [ERR] CNCT: ChannelArbitrator(xxxxxxxxxxxxxxxxxx:1): unable to advance state: insufficient fee
2024-06-15 21:33:31.475 [INF] CNCT: ChainArbitrator shutting down...
2024-06-15 21:33:31.479 [INF] NTFN: Cancelling epoch notification, epoch_id=10
2024-06-15 21:33:31.726 [INF] NTFN: Cancelling epoch notification, epoch_id=9
2024-06-15 21:33:31.726 [INF] HSWC: HTLC Switch shutting down...
2024-06-15 21:33:31.726 [INF] NTFN: Cancelling epoch notification, epoch_id=6
2024-06-15 21:33:31.726 [DBG] HSWC: HTLC Switch shutdown complete
2024-06-15 21:33:31.726 [INF] FNDG: Funding manager shutting down...

....

2024-06-15 21:33:31.962 [INF] BRAR: Breach arbiter shutting down...
2024-06-15 21:33:31.962 [INF] UTXN: UTXO nursery shutting down...
2024-06-15 21:33:31.962 [INF] NTFN: Cancelling epoch notification, epoch_id=5
2024-06-15 21:33:31.962 [INF] SWPR: Sweeper shutting down...
2024-06-15 21:33:31.962 [INF] NTFN: Cancelling epoch notification, epoch_id=4
2024-06-15 21:33:31.962 [INF] SWPR: TxPublisher stopping...
2024-06-15 21:33:31.962 [INF] NTFN: Cancelling epoch notification, epoch_id=3
2024-06-15 21:33:31.962 [INF] HSWC: HtlcNotifier shutting down...
2024-06-15 21:33:31.962 [DBG] HSWC: HtlcNotifier shutdown complete
2024-06-15 21:33:31.962 [INF] PRNF: PeerNotifier shutting down...
2024-06-15 21:33:31.962 [INF] CHNF: ChannelNotifier shutting down...
2024-06-15 21:33:31.962 [INF] NTFN: Cancelling epoch notification, epoch_id=2
2024-06-15 21:33:31.962 [INF] NTFN: bitcoind notifier shutting down...
2024-06-15 21:33:31.964 [INF] NTFN: Stopping mempool notifier
2024-06-15 21:33:31.964 [INF] HLCK: Health monitor shutting down...
2024-06-15 21:33:31.964 [ERR] LTND: Shutting down because error in main method: unable to start server: insufficient fee
2024-06-15 21:33:31.964 [INF] RPCS: Stopping RPC Server
2024-06-15 21:33:31.964 [INF] RPCS: Stopping InvoicesRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping PeersRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping RouterRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping SignRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping VersionRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping WalletKitRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping WatchtowerClientRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] RPCS: Stopping ChainRPC Sub-RPC Server
2024-06-15 21:33:31.965 [INF] TORC: Stopping tor controller
2024-06-15 21:33:31.965 [ERR] TORC: DEL_ONION got error: invalid arguments: unexpected code
2024-06-15 21:33:31.965 [ERR] LTND: error stopping tor controller: invalid arguments: unexpected code
2024-06-15 21:33:32.038 [INF] LTND: Shutdown complete

Your environment

lnd v0.18.0-beta
bitcoind v24.0.1

bitromortac commented 2 weeks ago

:+1: Perhaps it could be helpful to see what other force close/sweeping actions were done before for that channel. Did the force close initiate while running v0.18 (i.e. without an upgrade)? Could you find out who force closed in the initial force close? Any backend specifics would be good to know as well.

Looks like it is related to https://github.com/lightningnetwork/lnd/pull/8345 from the log message not accepted by mempool.

kaloudis commented 2 weeks ago

👍 Perhaps it could be helpful to see what other force close/sweeping actions were done before for that channel. Did the force close initiate while running v0.18 (i.e. without an upgrade)? Could you find out who force closed in the initial force close? Any backend specifics would be good to know as well.

Looks like it is related to #8345 from the log message not accepted by mempool.

I need to double check the timeline, but it is possible that the original FC was broadcasted while I was still on lnd v0.17.4. However this issue was not hit directly after the upgrade

bitromortac commented 2 weeks ago

The shutdown results from the error from TestMempoolAccept not being handled gracefully, by maybe wrapping with mapRpcclientError(err) here, to at least not shut down. https://github.com/lightningnetwork/lnd/blob/c0f7c2849da9dea424574651505a7f2e4494f37f/lnwallet/btcwallet/btcwallet.go#L1254 Interesting why there was a second force close with too low fees.

Roasbeef commented 2 weeks ago

How did you determine that it was using a different tx? I think the change in behavior here is that we'll now check to see if something can actually enter the mempool or not before broadcasting. In this case though, I think we want a bypass here to just attempt to publish anyway, as if we get that initial attempt, then we'll continue to rebroadcast in the background.

kaloudis commented 2 weeks ago

How did you determine that it was using a different tx? I think the change in behavior here is that we'll now check to see if something can actually enter the mempool or not before broadcasting. In this case though, I think we want a bypass here to just attempt to publish anyway, as if we get that initial attempt, then we'll continue to rebroadcast in the background.

the close TX was already broadcasted and visible on block explorers

Roasbeef commented 1 week ago

Was this a case of doing a co-op close, then also doing a force close while that was unconfirmed? Otherwise, there's no way that lnd can broadcast two different force close transactions (txid) unless an accidental breach occurred...

lightningnetwork / lnd

[bug]: On restart LND attempts to broadcast different FC tx for already closing channel, shuts down #8850

Background

Your environment