Open BhaagBoseDK opened 1 year ago
The thing is that Peer B wants to make sure that they can claim the timeout path before they fail back the htlc. Otherwise there is a chance that the offline peer comes back online just in time & then claims the success path. This will mean a loss of funds for peer B if they have already failed the htlc back to peer A
Sure but if you're already out of time on the backwards path you run that risk anyway? We're thinking about this on the LDK end and I'm a bit torn but it does seem like the "common case" here is the cascading failure, not the attack, though it's possible that changes with package relay.
the offline peer should only be able to claim until the HTLC expiry. It should have no bearing on when the FC transaction is confirmed on-chain.
Sure but if you're already out of time on the backwards path you run that risk anyway?
Ah, that is a good point.
the offline peer should only be able to claim until the HTLC expiry.
unforch that is not possible to enforce with bitcoin Script. After the htlc expiry, the output becomes a free-for-all if the pre-image is known.
another point is this specific case of "missing HTLC in remote commitment". In the FC from peer B -> offline peer, the HTLC was not in the remote commitment (therefore there is no possibility for the offline peer to come back and claim it later). The FC transaction does not even have the HTLC.
8f58a419830c62f9e708b6c47b5541c044a19a1cdc64c4eb0c903311d6282fdd
In this case it could be safely failed the HTLC with peer A.
another point is this specific case of "missing HTLC in remote commitment". In the FC from peer B -> offline peer, the HTLC was not in the remote commitment (therefore there is no possibility for the offline peer to come back and claim it later). The FC transaction does not even have the HTLC. 8f58a419830c62f9e708b6c47b5541c044a19a1cdc64c4eb0c903311d6282fdd In this case it could be safely failed the HTLC with peer A.
Not really because you have to make sure that your Commitment (without the HTLC) is confirmed, because your Peer has a valid Commitment Transaction with the HTLC included (at least you think it has one, you cannot be sure it did not receive it because he was offline before), this means this HTLC could very much be confirmed when your Peer has the preimage and decides to go onchain.
Ok I was having the same case with and Incoming and Outgoing HTLC being stuck because the outgoing HTLC was going onchain (and did not confirm until the incoming HTLC would run into the timeout). But luckily my incoming HTLC was failed back because of a positive side_effect of the interceptor. Basically the interceptor will fail all expiring incoming HTLCs which are close to expiry [13 blocks away]. (https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/interceptable_switch.go#L470). Thats exactly what happened in my case, it canceled it exactly 13 blocks before timeout.
I think the important code part is here:
https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/interceptable_switch.go#L293
here we cancel all incoming HTLCs although their outgoing counterpart is not resolved yet, at least we do not check whether there is an Outgoing HTLC on the downstream channel.
This failing of an incoming HTLC where the outgoing is still stuck is pretty new (9 months) and really hard to test in regtest mode (filling the mempool with unconfirmed TX), could you look into it @joostjager whether my analysis is correct :)
What I am saying is basically when you have an Interceptor running it will fail back Incoming HTLCs although their Outgoing counterpart is still not resolved yet. I think it's good, because otherwise your Peer will Force-Close on you anyways and you will lose the second channel.
Given the setup, Peer Alice -> Peer Bob -> Offline Peer Charlie
, if Charlie is offline during the whole time, then yeah it's safe to cancel the HTLC, but you can't be sure. If Charlie comes online after the FC, there are two scenarios,
This means Bob would not lose the HTLC if Charlie decides to come online and claim it for w/e reason. However, if Bob cancels the HTLC with Alice after the FC, he is at risk of losing it if Charlie decides to cheat.
So IMO canceling early is not a good choice. Instead, assuming this is an anchor channel, the most feasible way is to fee bump the force close tx.
Think we can close this
Ok I was having the same case with and Incoming and Outgoing HTLC being stuck because the outgoing HTLC was going onchain (and did not confirm until the incoming HTLC would run into the timeout). But luckily my incoming HTLC was failed back because of a positive side_effect of the interceptor. Basically the interceptor will fail all expiring incoming HTLCs which are close to expiry [13 blocks away]. (https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/interceptable_switch.go#L470). Thats exactly what happened in my case, it canceled it exactly 13 blocks before timeout.
I think the important code part is here:
https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/interceptable_switch.go#L293
here we cancel all incoming HTLCs although their outgoing counterpart is not resolved yet, at least we do not check whether there is an Outgoing HTLC on the downstream channel.
This failing of an incoming HTLC where the outgoing is still stuck is pretty new (9 months) and really hard to test in regtest mode (filling the mempool with unconfirmed TX), could you look into it @joostjager whether my analysis is correct :)
What I am saying is basically when you have an Interceptor running it will fail back Incoming HTLCs although their Outgoing counterpart is still not resolved yet. I think it's good, because otherwise your Peer will Force-Close on you anyways and you will lose the second channel.
If this is possible in interceptor why not in standard lnd?
This means Bob would not lose the HTLC if Charlie decides to come online and claim it for w/e reason. However, if Bob cancels the HTLC with Alice after the FC, he is at risk of losing it if Charlie decides to cheat.
So IMO canceling early is not a good choice. Instead, assuming this is an anchor channel, the most feasible way is to fee bump the force close tx.
Not sure if you read my comment, but having an active Interceptor will cancel it back although the downstream HTLC is not resolved, I think its unintended behaviour (see my comment above) should I investigate it further @yyforyongyu ?
I think there are some false assumptions going on here, LND will cancel back dust HTLCs (e.g. not on the commitment tx) here: https://github.com/lightningnetwork/lnd/blob/fd9adaf6ceb3649c07bbb4982bd60dd632e8cda0/contractcourt/channel_arbitrator.go#L1671-L1677 which then gets failed back to the incoming channel here: https://github.com/lightningnetwork/lnd/blob/fd9adaf6ceb3649c07bbb4982bd60dd632e8cda0/contractcourt/channel_arbitrator.go#L2145-L2153
So either peer A force closed for another reason or there is a separate bug
Reopening for discussion
This FC is still not confirmed in mempool and which is why peer B has not removed/failed the HTLC with peer A.
This is not true, the log line "immediately failing..." means that the HTLC was dust and failed backwards.
The HTLC is missing in remote commitment because the peer is offline and therefore has not acked the HTLC.
There would be two commitments, the remote pending commitment and the remote commitment. It would be in the remote pending commitment.
@ziggie1984 yes please!
Accidentally edited instead of commenting, but here's my comment:
This FC is still not confirmed in mempool and which is why peer B has not removed/failed the HTLC with peer A.
This is not true, the log line "immediately failing..." means that the HTLC was dust and failed backwards.
The HTLC is missing in remote commitment because the peer is offline and therefore has not acked the HTLC.
There would be two commitments, the remote pending commitment and the remote commitment. It would be in the remote pending commitment.
Well in that edit you seem to have remove relevant information.
-> The HTLC in question was 20015. Is that dust?
-> The HTLC was in remote pending commitment. So upon expiry peer B force closed with offline peer. See txn 8f58a419830c62f9e708b6c47b5541c044a19a1cdc64c4eb0c903311d6282fdd
. You can see the HTLC is not present in this force close (because it was not acked by offline peer).
-> This txn was not confirmed for 144 blocks due to congested mempool. So after 144 blocks (CLTV delta of peer B), peer A force closed on peer B. See txn 8dcdcb446b3cbfc38e6164e03592c4593654d29426e27c036d4948f7403d509a
. The HTLC is present in this transaction indicating it was not failed back.
Relevant log line is here: https://github.com/lightningnetwork/lnd/blob/fd9adaf6ceb3649c07bbb4982bd60dd632e8cda0/contractcourt/channel_arbitrator.go#L1868 meaning that the HTLC is failed back, but there is perhaps a bug in the code somewhere which we can't diagnose without logs
I analysed this situation further and can conclude that LND will not cancel back the HTLC (if it's not dust) and will hold onto it until the peer FCs the outgoing HTLC (without a registered Interceptor).
With or without a registered interceptor LND will fail the incoming HTLC back without verifying that the outgoing HTLC is still active iff the incoming HTLC runs into the RejectDelta of 13 blocks AND the ChannelLink is reconnected.
Scenario: Alice => Bob => Carol
Bob has a increased RejectDelta: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/interceptable_switch.go#L562
Replaced it with (40+78)
Now Carol creates a hold-invoice, Bob registers an interceptor. I will now mine 3 blocks so I come into the RejectDelta of Bob, and I only need to Reconnect/Disconnect and the Incoming HTLC fails although the outgoing is still not resolved.
Log of Bob (as expected):
2023-05-15 10:53:58.060 [DBG] HSWC: Interception rejected because htlc expires too soon: circuit=(Chan ID=204:1:0, HTLC ID=3), height=216, incoming_timeout=333
2023-05-15 10:53:58.060 [DBG] HSWC: ChannelLink(3bb535672973053a3184cf77ced48583204c4252521221518c05e619fbcccd19:0): queueing removal of FAIL closed circuit: (Chan ID=204:1:0, HTLC ID=3)->(Chan ID=199:2:0, HTLC ID=0)
Now I cancel back the holdinvoice on Carol's node:
Now the logs show as expected on Bob's node:
2023-05-15 11:03:09.644 [ERR] HSWC: unable to find target channel for HTLC fail: channel ID = 199:2:0, HTLC ID = 1
2023-05-15 11:03:09.644 [ERR] HSWC: Unhandled error while reforwarding htlc settle/fail over htlcswitch: unable to find target channel for HTLC fail: channel ID = 199:2:0, HTLC ID = 1
2023-05-15 11:03:10.114 [DBG] HSWC: Sent 0 satoshis and received 0 satoshis in the last 10 seconds (0.100000 tx/sec)
Before fixing this issue, I would like to propose a Config-Setting where the Node-Runner can decide whether he is willing to bear the risk and cancel back Incoming HTLCs when the Outgoing HTLC is still not resolved (not worth maybe sweeping because chain fees are too high). Otherwise I find this "bug" kind of handy for now to cancel back if I want to in case my Outgoing HTLCs are not resolved in time.
To fix this issue we definitely need to check if there is still an outgoing HTLC at play before canceling back.
@BhaagBoseDK can you share logs so we can diagnose the bug when we get a chance
my log snippets are in the description unless you want specific htlc or time window? Please note I do not have debug logs.
@BhaagBoseDK can you save your logs for the specific htlc and for this time so somebody can look at it when this gets prioritized?
I managed to catch a very good case.
Me: LND 0.17.3, bitcoind 25 (full indexed, mempool = 3Gb), Ubuntu, clearnet only My peer: unknown, tor only
I have a HTLC in our channel:
{
"incoming": false,
"amount": "54572",
"hash_lock": "c99e...83f6",
"expiration_height": 821335,
"htlc_index": "87642",
"forwarding_channel": "896115171070509057",
"forwarding_htlc_index": "78605"
}
The known part of the route is :
(1) (someone) (2 - me) 03423790614f023e3c0cdaa654a3578e919947e4c3a14bf5044e7c787ebd11af1a (3 - my peer) 021720a04a2094ccff4c56bd6ab20f7e36e0af17cb0d3b90ea00ce0f07bd51cf8c (4) 0284e3ca3753632c51a7d9a156370161ce2a19af41dbf4966eecf74bf3f7ba0a79 (5) (someone)
The channel between (3) and (4) was FCed : tx ce960ba459e62fbbe9178130de89fb595afa8ffb390b954d3e3f3aaf4e0f3f56
The relevant HTLC - in the channel between (3) and (4) - went onchain :
In our channel - between (2) and (3) - this HTLC is still alive and its status does not change. There are no records in the log (HSWC is in DEBUG mode) with this HTLC. Other contracts appear and are getting resolved as usual in this channel, both nodes are ok and online, our channel is active and enabled on both sides, my HSWC works as usual.
Obviously the channel closing transaction will not be mined until the HTLC expires and our channel - between (2) and (3) - is doomed to be FCed.
Reconnecting or restarting the nodes (both mine and my peer's) doesn't help.
Question 0 is : do I understand the situation correctly?
...and if yes...
Question 1 is : how was this guy (4) able to FC the channel between (3) and (4) with a fee 9 sats/vbyte while the normal fee at that moment was more than 100? Сan I do the same with my channels? ;)
Question 2 is : I definitely don't want to pay a 100000+ sats fee for this ability of that guy, which is not even my peer. Сan we somehow avoid such situations?
Question 2 is : I definitely don't want to pay a 100000+ sats fee for this ability of that guy, which is not even my peer. Сan we somehow avoid such situations?
Make sure you reconnect to peer 3 when the htlc approaches the blockdealine + 13 blocks, only then will your peer fail the htlc back and no FC will happen on your channel.
Question 1 is : how was this guy (4) able to FC the channel between (3) and (4) with a fee 9 sats/vbyte while the normal fee at that moment was more than 100? Сan I do the same with my channels? ;)
The max-anchor-commitfee is default to 10sats, but I am wondering why the channel is not CPFPed, maybe its already purged out of the mempool by the respetive nodes, then he will not be able to bump it.
The max-anchor-commitfee is default to 10sats, but I am wondering why the channel is not CPFPed, maybe its already purged out of the mempool by the respetive nodes, then he will not be able to bump it.
I have seen many of these cases where the commitment fee rate is at around 10sat/vbyte (see https://github.com/lightningnetwork/lnd/discussions/8271), although it should be higher (https://github.com/lightningnetwork/lnd/issues/8240#issuecomment-1854546090).
@ziggie1984, Thank you for your answer.
Make sure you reconnect to peer 3 when the htlc approaches the blockdealine + 13 blocks, only then will your peer fail the htlc back and no FC will happen on your channel.
Practice shows that in such cases reconnection does not help, but restarting the node shortly before the expiration of the HTLC helps. Obviously there is some difference between a simple reconnect and what happens after restart. I'll try to collect some logs and come back when I find something interesting.
I have seen many of these cases where the commitment fee rate is at around 10sat/vbyte (see https://github.com/lightningnetwork/lnd/discussions/8271), although it should be higher (https://github.com/lightningnetwork/lnd/issues/8240#issuecomment-1854546090).
Good input, so looked at it as well, so #8271 definitely is not the right behavior in during chan opening, but the fee negotiation for normal UpdateFee
msgs should cap at the min_relay fee. Tho there might always be the problem between the two peers, the initiator might have an increased mempool, but the node force-closing the channel might not, so we might end up in this situation where the non-initiator cannot bump the fee of the commitment. Not sure if there is really a fix for this for now, because not accepting feeupdates might cause problems. 🤔
Practice shows that in such cases reconnection does not help, but restarting the node shortly before the expiration of the HTLC helps. Obviously there is some difference between a simple reconnect and what happens after restart. I'll try to collect some logs and come back when I find something interesting.
That would be great, are you verifying that you disconnect the peer and then connect again, because the link needs to be torn down for this to work.
That would be great, are you verifying that you disconnect the peer and then connect again, because the link needs to be torn down for this to work.
Of course. Disconnect and connect the peer again.
Background
Consider an HTLC chain
Peer A -> Peer B -> Offline Peer
And assume Peer B Force Closes on Offline Peer due to HTLC missing in remote commitment upon expiry of HTLC.
The Force Close transaction is stuck in mempool for 144 blocks (CLTV delta of Peer B)
Now after 144 blocks, the peer A will also force close on peer B just because peer B has not failed the HTLC backward.
This causes a cascade of FC in current mempool (and specially with peers with shorter CLTV delta).
There is a similar case with LDK -> https://github.com/lightningdevkit/rust-lightning/issues/2275
Logs: Peer B force closes on an offline peer after HTLC expiry.
The force close transaction is still in mempool. 144 blocks later peer A also force closed in a cascade
The second force close would have been prevented if HTLC was failed backward by peer B after force close with Offline Peer.
Your environment
version of
lnd
"version": "0.16.2-beta commit=v0.16.2-beta",which operating system (
uname -a
on *Nix)Linux umbrel 5.10.17-v8+ #1421 SMP PREEMPT Thu May 27 14:01:37 BST 2021 aarch64 GNU/Linux
Steps to reproduce
See background.
Expected behaviour
When Peer B force closes on offline peer/forward peer, it should immediately fail the HTLC backward to prevent peer A force close.
Actual behaviour
Cascade of Force Close down the chain.