lightningdevkit / rust-lightning

A highly modular Bitcoin Lightning library written in Rust. It's rust-lightning, not Rusty's Lightning!
Other
1.16k stars 366 forks source link

force close is not always broadcasting latest TX #3155

Open zoedberg opened 4 months ago

zoedberg commented 4 months ago

I'm using RLN, which uses a forked version of rust-lightning, where I've added support for RGB. I'm running an integration test (https://github.com/RGB-Tools/rgb-lightning-node/blob/master/src/test/close_force_standard.rs) where I open a channel between 2 nodes, make 2 payments via keysend and force close the channel.

After the channel closure I sometimes see a behavior (non-deterministic, but quite frequent) where the BumpTransactionEventHandler tries to bump the fees for an HTLC TX (we are using anchor outputs). This is strange because before closing the channel, during the keysend, we wait for the payment to become successful (which happens during the PaymentSent and PaymentClaimed events), therefore we expect the channel to have no pending HTLCs and the node to broadcast the last commiment TX.

I'm writing here first of all to understand if you think this is expected behavior or not. That is, waiting for the mentioned events should guarantee that the OnchainTxHandler will broadcast the last valid commitment TX?

I've also collected some logs, hoping this could help better understand the issue:

I don't think that our changes to support RGB could cause this issue. If you think this part is well-tested on your side I'll investigate more and maybe try to reproduce it on a vanilla node.

tnull commented 4 months ago

FWIW, we saw similar flaky behavior in LDK Node's integration tests.

I might be wrong, but I suspect this could be a race with ChannelMonitor persistence that might have been introduced as a byproduct of #2957, as we no longer wait for persistence to finish before issuing the payment events? (cc @G8XSU)

TheBlueMatt commented 4 months ago

This is strange because before closing the channel, during the keysend, we wait for the payment to become successful (which happens during the PaymentSent and PaymentClaimed events), therefore we expect the channel to have no pending HTLCs and the node to broadcast the last commiment TX.

This is not a guarantee - PaymentSent indicates that we received the preimage back for our payment (i.e. might happen before we even see a new commitment transaction) and PaymentClaimed guarantee sthat the preimage is durably on disk (which might happen before we have a new commitment transaction). We try to get users these events once we're sure things are irrevocable, but that might happen on-chain rather than in-channel.

I see a call to provide_latest_holder_tx...

Hmm, this event order does somewhat surprise me, however. Are you sure you're seeing this on the same node's ChannelMonitor? Can you share the logs (and, to confirm, this is a fork based on LDK 0.0.123)?

I might be wrong, but I suspect this could be a race with ChannelMonitor persistence that might have been introduced as a byproduct of https://github.com/lightningdevkit/rust-lightning/pull/2957, as we no longer wait for persistence to finish before issuing the payment events? (cc @G8XSU)

That shouldn't impact this. That change only impacts persists based on block-connection, but here we're (I think?) just talking about persists due to channel state changes, and specifically when the ChannelManager generates events wrt those persists.

zoedberg commented 4 months ago

This is not a guarantee - PaymentSent indicates that we received the preimage back for our payment (i.e. might happen before we even see a new commitment transaction) and PaymentClaimed guarantee sthat the preimage is durably on disk (which might happen before we have a new commitment transaction). We try to get users these events once we're sure things are irrevocable, but that might happen on-chain rather than in-channel.

Got it, thanks for the explanation. So I guess in our tests we should keep a small sleep to be sure the force close operation will always broadcast the latest commitment TX.

Hmm, this event order does somewhat surprise me, however. Are you sure you're seeing this on the same node's ChannelMonitor? Can you share the logs (and, to confirm, this is a fork based on LDK 0.0.123)?

I confirm the fork is based on LDK 0.0.123, but nice catch, I wasn't checking which node was printing the change to self.holder_commitment and after rerunning the test with the node information I can now say there is no strange behavior, the node correctly broadcasts the last TX assigned to self.holder_commitment. Sorry for my incorrect reporting. I guess this issue can be closed.