lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.63k stars 2.07k forks source link

Node forced closed a channel with zapread.com, not sure why #4613

Closed manreo closed 3 years ago

manreo commented 3 years ago

Background

My node force-closed a channel with zapread.com, it was a good and active channel, I am not sure why my node decided to force-close the channel, some relevant things that I found, full log attached

2020-09-02 04:58:55.181 [ERR] HSWC: ChannelLink(563360:2528:1): failing link: ChannelPoint(25194f104e65a27cfd6867ac8e8708c9cb898e1212a51ff1862ffb03cc1e340b:1): received error from peer: chan_id=0b341ecc03fb2f86f11fa512128e89cbc908878eac6768fd7ca2654e104f1924, err=internal error with error: remote error
2020-09-02 04:58:55.185 [ERR] HSWC: ChannelLink(563360:2528:1): link failed, exiting htlcManager
2020-09-02 04:58:55.185 [INF] HSWC: ChannelLink(563360:2528:1): exited

. . .

2020-09-02 12:45:07.278 [INF] DISC: GossipSyncer(037659a0ac8eb3b8d0a720114efc861d3a940382dcfa1403746b4f8f6b2e8810ba): applying gossipFilter(start=0001-01-01 00:00:00 +0000 UTC, end=0001-01-01 00:00:00 +0000 UTC)
2020-09-02 12:45:07.279 [INF] DISC: GossipSyncer(0231eccc6510eb2e1c97c8a190d6ea096784aa7c358355442055aac8b20654f932): applying gossipFilter(start=2020-09-02 12:45:07.279317998 +0000 UTC m=+408745.563033846, end=2156-10-09 19:13:22.279317998 +0000 UTC m=+4295376040.563033846)
2020-09-02 12:45:37.279 [INF] DISC: Broadcasting 265 new announcements in 18 sub batches
2020-09-02 12:46:03.069 [INF] CRTR: Processed channels=0 updates=232 nodes=0 in last 59.998613183s
2020-09-02 12:47:03.069 [INF] CRTR: Processed channels=0 updates=272 nodes=0 in last 59.999710346s
2020-09-02 12:47:07.338 [INF] DISC: Broadcasting 388 new announcements in 18 sub batches
2020-09-02 12:48:03.069 [INF] CRTR: Processed channels=0 updates=170 nodes=3 in last 59.999843239s
2020-09-02 12:48:37.433 [INF] DISC: Broadcasting 219 new announcements in 17 sub batches
2020-09-02 12:48:54.682 [INF] CRTR: Pruning channel graph using block 0000000000000000000e65c5c2126e7c902fa09351b8087ab331182eeabebfab (height=646431)
2020-09-02 12:48:56.056 [INF] NTFN: New block: height=646431, sha=0000000000000000000e65c5c2126e7c902fa09351b8087ab331182eeabebfab
2020-09-02 12:48:56.059 [ERR] LTND: Unable to lookup witness: no witnesses
2020-09-02 12:48:56.060 [INF] UTXN: Attempting to graduate height=646431: num_kids=0, num_babies=0
2020-09-02 12:48:56.413 [INF] HSWC: Garbage collected 1 shared secret hashes at height=646431
2020-09-02 12:48:56.467 [INF] CNCT: ChannelArbitrator(25194f104e65a27cfd6867ac8e8708c9cb898e1212a51ff1862ffb03cc1e340b:1): force closing chan
2020-09-02 12:48:56.611 [INF] HSWC: Removing channel link with ChannelID(0b341ecc03fb2f86f11fa512128e89cbc908878eac6768fd7ca2654e104f1924)
2020-09-02 12:48:57.822 [INF] CRTR: Block 0000000000000000000e65c5c2126e7c902fa09351b8087ab331182eeabebfab (height=646431) closed 1 channels
2020-09-02 12:49:00.967 [INF] CNCT: Broadcasting force close transaction d9ad30a963638a6efe405b608e1746b55092cb877e620d7135fe083becadbc51, ChannelPoint(25194f104e65a27cfd6867ac8e8708c9cb898e1212a51ff1862ffb03cc1e340b:1): (*wire.MsgTx)(0x40016b6200)({
 Version: (int32) 2,
 TxIn: ([]*wire.TxIn) (len=1 cap=1) {
  (*wire.TxIn)(0x40072170e0)({
   PreviousOutPoint: (wire.OutPoint) 25194f104e65a27cfd6867ac8e8708c9cb898e1212a51ff1862ffb03cc1e340b:1,
   SignatureScript: ([]uint8) <nil>,
   Witness: (wire.TxWitness) (len=4 cap=4) {
    ([]uint8) <nil>,
    ([]uint8) (len=72 cap=144) {
     00000000  30 45 02 21 00 cb 77 1a  b5 0a 77 19 78 f8 6e d1  |0E.!..w...w.x.n.|
     00000010  5b d3 0d f2 c5 09 b8 0a  f1 cd 7b e4 d5 37 54 ca  |[.........{..7T.|
     00000020  03 8c fd 45 41 02 20 00  98 e9 33 4e 9b 54 72 1c  |...EA. ...3N.Tr.|
     00000030  03 46 f9 80 7b f3 3f 2c  ff 9c 9b bf 9d bf fb d6  |.F..{.?,........|
     00000040  57 0f d7 89 b8 b5 9f 01                           |W.......|
    },
    ([]uint8) (len=72 cap=144) {
     00000000  30 45 02 21 00 c7 1b 89  a9 bb 26 e3 70 62 da 92  |0E.!......&.pb..|
     00000010  18 79 e4 b3 c6 2d cd d2  23 55 33 82 6b 6d 69 e5  |.y...-..#U3.kmi.|
     00000020  4f e5 15 e8 ed 02 20 1a  0f c3 a1 e3 78 c9 60 02  |O..... .....x.`.|
     00000030  23 33 20 47 70 0b 4c a2  71 0a 75 5c 8c f0 ef 90  |#3 Gp.L.q.u\....|
     00000040  5a b2 d8 b8 21 92 e2 01                           |Z...!...|
    },
    ([]uint8) (len=71 cap=500) {
     00000000  52 21 02 10 09 28 9d 07  68 3c dc 41 e5 b6 c5 45  |R!...(..h<.A...E|
     00000010  2d e9 ae e4 2f 24 5a bd  7f c5 ca 2b 7f 81 87 f7  |-.../$Z....+....|
     00000020  1f 5f f9 21 03 fd 59 d9  e6 f3 cc 0b 54 bb 88 11  |._.!..Y.....T...|
     00000030  8b 8e b1 aa 2d 5e f7 f3  b2 35 7f 05 9d 84 d4 0e  |....-^...5......|
     00000040  ab 89 94 d2 d9 52 ae                              |.....R.|
    }
   },
   Sequence: (uint32) 2160573938
  })
 },
 TxOut: ([]*wire.TxOut) (len=2 cap=2) {
  (*wire.TxOut)(0x4004b83480)({
   Value: (int64) 18918,
   PkScript: ([]uint8) (len=22 cap=22) {
    00000000  00 14 34 61 63 e4 5b 89  ae dd 6b 9d 6f 3d bf 58  |..4ac.[...k.o=.X|
    00000010  17 fd c8 2e d4 61                                 |.....a|
   }
  }),
  (*wire.TxOut)(0x4004b834a0)({
   Value: (int64) 151877,
   PkScript: ([]uint8) (len=34 cap=34) {
    00000000  00 20 64 f9 21 c5 22 fc  e3 38 b4 97 59 36 eb a5  |. d.!."..8..Y6..|
    00000010  6d c4 1f 4a 2d 1d 25 d9  18 9b 7d 9b 3b d8 cc cf  |m..J-.%...}.;...|
    00000020  4a db                                             |J.|
   }
  })
 },
 LockTime: (uint32) 540252184
})

2020-09-02 12:49:01.181 [INF] LNWL: Inserting unconfirmed transaction d9ad30a963638a6efe405b608e1746b55092cb877e620d7135fe083becadbc51

Your environment

lnd version 0.11.0-beta commit=v0.11.0-beta
Bitcoin Core Daemon version v0.18.0
Linux rock64 4.4.132-1075-rockchip-ayufan-ga83beded8524 #1 SMP Thu Jul 26 08:22:22 UTC 2018 aarch64 GNU/Linux

Full log available here (password protected, send me a slack message and I'll send you the password)

lnd.log.80.zip

enorrmann commented 3 years ago

same thing happened to me, had a couple of force closings for no reason lnd 0.11.0-beta, neutrino, linux x64

Roasbeef commented 3 years ago

Unless we've identified a new spurious force close vector. I'd check that you didn't have an HTLC which has expired. If not, then although we've fixed a number of spurious force close vectors, both sides need to be updated to have them disappear completely.

manreo commented 3 years ago

Unless we've identified a new spurious force close vector. I'd check that you didn't have an HTLC which has expired

Not sure how to check it? I've uploaded the full log on those days.

both sides need to be updated to have them disappear completely.

Luckily he is on github. @Horndev can you verify please what was your ZapRead node lnd version around 2020-09-02 ? Thank you

enorrmann commented 3 years ago

happened again with another node, I have some logging information here https://rayo.host/logs/force_close.log I'm running 0.11Beta.. and the other peer is @Blockdaemon, I'm not sure which version are they on

enorrmann commented 3 years ago

and another one https://rayo.host/logs/force_close_2.log

enorrmann commented 3 years ago

another one https://rayo.host/logs/force_close_3.log

Roasbeef commented 3 years ago

As mentioned above, if no HTLCs were present on the commitment transaction, then it may be due to the other node having an older version with bugs that could cause it to send an invalid commitment. In each of those statements, I'd look further back in the logs to see if the peer sent an invalid commitment earlier in the exchange.

Crypt-iQ commented 3 years ago

If you can get the version of the peer, this would help us. Also if there's a specific "invalid commitment" log message, that would aid us.

enorrmann commented 3 years ago

2 days ago I opened this channels lncli openchannel --node_key 03a5886df676f3b3216a4520156157b8d653e262b520281d8d325c24fd8b456b9c --sat_per_byte 3 --local_amt 3000000 lncli openchannel --node_key 02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c --sat_per_byte 3 --local_amt 6000000 lncli openchannel --node_key 0395033b252c6f40e3756984162d68174e2bd8060a129c0d3462a9370471c6d28f --sat_per_byte 3 --local_amt 8000000 lncli openchannel --node_key 03e50492eab4107a773141bb419e107bda3de3d55652e6e1a41225f06a0bbf2d56 --sat_per_byte 3 --local_amt 7000000

today I found all of them were force closed from my side no "invalid commitment" log message found Edit: node names are yalls-clearnet WalletOfSatoshi.com BitMEXResearch

Crypt-iQ commented 3 years ago

Can you try to pinpoint the error messages, there should be something like link failed. The thing I'm after is what message was sent across. As you are a neutrino node, force closes with older nodes is more likely to happen. If any of the other counterparties are not on the 0.11 branch, then force closes can happen like laolu mentioned up above

enorrmann commented 3 years ago

found it https://rayo.host/logs/link_failed.log

Crypt-iQ commented 3 years ago

Thanks, is there a way to open it to a paste-bin like site if things aren't too sensitive?

enorrmann commented 3 years ago

https://pastebin.com/BLhbEC44 do you need the full log ?

Crypt-iQ commented 3 years ago

Two interesting messages:

First error should've been fixed in 0.11 if they upgraded too. Not sure why the other occurs

Ping @cfromknecht

manreo commented 3 years ago

Another "local force-close", with runam0k nodl:

Peer Public Key: 02f8f981a3d6cb6536fc12ea2abdcfd20f7490e28197514aafc05a7a1f08d3de09 Channel Point: c5e508504fda9c200d458bb68c8e814615fdf805a44a8bb3c5fa69ffb7f59ce5 Closing Transaction Hash: 22429d9eb955db66c638a4e8a23b0a41cfc4862dc4d713d7b0d6f478e3fe4083

Full log: https://privatebin.net/?6fa87ce4beca6a74#C8QqFN1YY3zP985v3HQLbe6rv79LQyGyHNsSigycerNz

Crypt-iQ commented 3 years ago

So you need debug level logs on - and you keep disconnecting with a tor node so it could be that this is triggering those issues fixed in 0.11. It makes sense that a tor node would cause reliability issues.

manreo commented 3 years ago

Hi @Crypt-iQ how can I run with debug mode? About the disconnects, is it related to this: https://github.com/lightningnetwork/lnd/issues/4575

Still waiting for a response from @Roasbeef about #4575

enorrmann commented 3 years ago

Hi @Crypt-iQ how can I run with debug mode?

just edit lnd.conf debuglevel=debug or finer detail see https://github.com/lightningnetwork/lnd/blob/master/sample-lnd.conf

enorrmann commented 3 years ago

today I got another one... force closing from my side https://pastebin.com/FtzewS93 channel point ea3e68dcba3367a4d5b59da9f6366c42a87d3023422ff79925d1659682587f43:0

Roasbeef commented 3 years ago

First error should've been fixed in 0.11 if they upgraded too. Not sure why the other occurs

The second there can happen due to slight differences between implementations w.r.t what order they expect certain messages in.

2020-09-25 04:51:17.065 [INF] HSWC: ChannelLink(646845:1072:0): received re-establishment message from remote side 2020-09-25 04:51:17.080 [ERR] HSWC: ChannelLink(646845:1072:0): failing link: unable to handle upstream add HTLC: ID 27 on HTLC add does not match expected next ID 26 with error: invalid update

Note sure which implementation this is, but we've seen in the past that certain implementations don't re-transmit HTLCs in order of increasing HTLC ID, which can at times trigger this issue.

@enorrmann on that latest one, there were no other logs before that force close attempt, nothing on the p2p layer of maybe someone triggering a DLP scenario?

Roasbeef commented 3 years ago

Edit: node names are

This is really useful info, thanks! So from here, we should be able to check out the node announcements themselves, then check the feature bits to see if we can place them in an lnd version bucket from there.

Crypt-iQ commented 3 years ago

Can you both resend logs with debug enabled if you can? Can do over slack, name is "eugene". Or publicly here.

manreo commented 3 years ago

Hi @Crypt-iQ , the node is now running with debung mode, but I need to wait for a new forceclose to have the log files that includes debugstate, correct ?

Crypt-iQ commented 3 years ago

@MrManPew yes

manreo commented 3 years ago

I got two more force close, but I don't have the log files as only 3 logs are saved and now that I am in debug mode, the logs already past the time of the close 🤦 I changed the settings to save more than 3 log files... But now we need to wait again. Those force-close are expensive...

Roasbeef commented 3 years ago

@MrManPew after doing debug mode for all the logs, if you set lncli debuglevel --level=PEER=info.

manreo commented 3 years ago

@Roasbeef If I set lncli debuglevel --level=PEER=info then what? I think you did not finish the sentence. This is my lnd.conf:

# RaspiBolt LND Mainnet: lnd configuration
# /home/bitcoin/.lnd/lnd.conf
[Application Options]
debuglevel=debug
maxpendingchannels=5
color=#68F442
listen=localhost
nat=false
maxlogfiles=50
maxlogfilesize=20
#sync-freelist=1
rpclisten=0.0.0.0:10009
tlsextraip=192.168.1.0/24

[Bitcoin]
bitcoin.active=1
bitcoin.mainnet=1
bitcoin.node=bitcoind

[autopilot]
autopilot.active=0
autopilot.maxchannels=5
autopilot.allocation=0.6

[Tor]
tor.active=true
tor.v3=true
manreo commented 3 years ago

@Roasbeef @Crypt-iQ Another one, with LN.SIDESHIFT.AI

2020-10-08 20:01:51.993 [DBG] CNCT: ChannelArbitrator(3114234b7b95278049b6790037b3e7d74d56a9cd13956d9fce32b05adc7ea140:0): go to chain for outgoing htlc 4811bd19193e4a29f5fcd025ab5e4d95b11f5bce0d1be89942bede0e3bec74c8: timeout=651859, blocks_until_expiry=0, broadcast_delta=0
2020-10-08 20:01:52.423 [INF] CNCT: ChannelArbitrator(3114234b7b95278049b6790037b3e7d74d56a9cd13956d9fce32b05adc7ea140:0): force closing chan
2020-10-08 20:01:59.410 [DBG] LNWL: ChannelPoint(3114234b7b95278049b6790037b3e7d74d56a9cd13956d9fce32b05adc7ea140:0): starting local commitment: 

Full log attached lnd.log.597_pass.zip

Password sent via slack.

Roasbeef commented 3 years ago

@MrManPew that force close isn't abnormal, given that lnd went on-chain in order to claim an outgoing HTLC.

manreo commented 3 years ago

But why did my node decided to do it ? I did not ask it to close the channel?

Roasbeef commented 3 years ago

If your node didn't go on-chain, then it may have had to forfeit the amount delta of that HTLC. This is how HTLCs work: if you extend one to a party and they never claim it off-chain, if you want the funds back, then you need to go on-chain to claim them.

manreo commented 3 years ago

Not sure I get it, but this is LN.SIDESHIFT.AI fault for not returning the HTLC back when a payment failed? Is it a connectivity problem? Any suggestions?

manreo commented 3 years ago

@Roasbeef @Crypt-iQ Another one, now with fixedfloat.com BTW:

admin@rock64:~$ bitcoind --version
Bitcoin Core Daemon version v0.18.0

Could be the same problem, as I see ~sort of the same log:

2020-10-09 13:47:47.633 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): checking commit chain actions at height=651962, in_htlc_count=0, out_htlc_count=1
 lnd.log.623
2020-10-09 13:58:32.867 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): new block (height=651963) examining active HTLC's
 lnd.log.623
2020-10-09 13:58:32.871 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): checking commit chain actions at height=651963, in_htlc_count=0, out_htlc_count=1
 lnd.log.623
2020-10-09 14:41:41.276 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): new block (height=651964) examining active HTLC's
 lnd.log.624
2020-10-09 14:41:41.285 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): checking commit chain actions at height=651964, in_htlc_count=0, out_htlc_count=1
 lnd.log.624
2020-10-09 15:13:45.445 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): new block (height=651965) examining active HTLC's
 lnd.log.624
2020-10-09 15:13:45.449 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): checking commit chain actions at height=651965, in_htlc_count=0, out_htlc_count=1
 lnd.log.624
2020-10-09 15:13:57.348 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): new block (height=651966) examining active HTLC's
 lnd.log.624
2020-10-09 15:13:57.355 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): checking commit chain actions at height=651966, in_htlc_count=0, out_htlc_count=1
 lnd.log.624
2020-10-09 15:13:57.358 [DBG] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): go to chain for outgoing htlc 0620f0086e877919b190ef87d6b67f38bdca87ca7043781f688364c070f86327: timeout=651966, blocks_until_expiry=0, broadcast_delta=0
 lnd.log.624
2020-10-09 15:13:57.986 [INF] CNCT: ChannelArbitrator(09164b0b2514b6f1fd3aa9e6b856c3281337ee3f12a5066cb3ef1254e51b51e2:1): force closing chan

Not sure it is related, but the situation escalated the moment I increased the logging to debug mode

zip password sent by slack. logss1.zip

Thanks

manreo commented 3 years ago

@Roasbeef , @Crypt-iQ I got another force close, as far as I understand, now out_htlc_count=0. So not clear from the logs why there is a force-close

Nodename: LNBIG.com [lnd-13] Channel Point: cc73a92ce2ec37df18f8cb47333026e1d466e66d9ea9dee736d2b2ddbbb8e1dd:0

2020-10-11 04:03:10.206 [DBG] CNCT: ChannelArbitrator(cc73a92ce2ec37df18f8cb47333026e1d466e66d9ea9dee736d2b2ddbbb8e1dd:0): new block (height=652193) examining active HTLC's
2020-10-11 04:03:10.215 [DBG] CNCT: ChannelArbitrator(cc73a92ce2ec37df18f8cb47333026e1d466e66d9ea9dee736d2b2ddbbb8e1dd:0): checking commit chain actions at height=652193, in_htlc_count=0, out_htlc_count=0
2020-10-11 04:03:10.525 [INF] HSWC: Garbage collected 2 shared secret hashes at height=652193
2020-10-11 04:03:10.546 [DBG] PEER: Received ChannelUpdate(chain_hash=000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, short_chan_id=605108527822340096, mflags=00000001, cflags=00000000, update_time=2020-10-11 02:20:29 +0000 UTC) from 037f990e61acee8a7697966afd29dd88f3b1f8a7b14d625c4f8742bd952003a590@185.5.53.91:9735
2020-10-11 04:03:10.547 [DBG] PEER: Received ChannelUpdate(chain_hash=000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, short_chan_id=620141050870497280, mflags=00000001, cflags=00000001, update_time=2020-10-11 03:42:10 +0000 UTC) from 02b686ccf655ece9aec77d4d80f19bb9193f7ce224ab7c8bbe72feb3cdd7187e01@127.0.0.1:35544
2020-10-11 04:03:10.600 [INF] CNCT: ChannelArbitrator(cc73a92ce2ec37df18f8cb47333026e1d466e66d9ea9dee736d2b2ddbbb8e1dd:0): force closing chan

password send on slack logg2.zip

Thanks

Roasbeef commented 3 years ago

In both of those instances, you went on-chain to resolve HTLCs. Nothing out of the ordinary.

rockstardev commented 3 years ago

Just to chime in and say that we at Strike are also experiencing this issue. We'll try to come up with logs and help debug it. cc @mrfelton

Roasbeef commented 3 years ago

Thanks for the new reports all! Just to note again, we're only interested in force closures that aren't HTLC related go-to chain events. When sending or scanning for logs, the main sub-system to look at is LNWL (can show invalid sigs, or w/e), HSWC (the switch), and CNCT (the thing that decides go on-chain).

So far none of the logs above (that we could decrypt) have been actionable yet on our part.

ghost commented 3 years ago

Not sure I get it, but this is LN.SIDESHIFT.AI fault for not returning the HTLC back when a payment failed? Is it a connectivity problem? Any suggestions?

Is there an answer to this question above? Didn't see one here. Should I stop connecting with nodes that do this? Is it me? Is running on tor causing missed payments with connectivity problems? Is this malice from the sender that is routing through me? Seems like the answer is "totally normal" with no answer as to how to minimize this from occurring.

carlaKC commented 3 years ago

Is there an answer to this question above?

Once a htlc is locked into your commitment transaction, the nodes in the channel need to be online to settle the htlc (if the preimage is revealed) or to time it out if the preimage is not revealed by the timeout block.

There are a few reasons why this could happen:

It's most likely that your node isn't reachable, or you have a peer who has bad uptime percentage. Generally, it's desirable to peer with nodes that have good uptime so that this doesn't happen. Unfortunately, it's difficult to know whether a peer is going to stay online before you actually open a channel with them. The protocol has a disabled flag which nodes send out when their channel peers go dark (which can act as a proxy for uptime if you monitor it over time). This is one of the things we track in the BOS scores, so that could be a good place to get some info before connecting (although it is not the only data point, more info in the "Lightning Metrics" section here).

More generally, if you have channels open with peers who have just gone dead (offline for a long time), we highly recommend that you close the channel out because those nodes will not be around to help out with static channel backups if you do need to recover from a lost channeldb.

carlaKC commented 3 years ago

Another actionable step would be to update policy on your public channels. These force closes do cost on-chain fees, so it's worth adjusting your policies beyond the default values to reflect the chance that you may face these costs:

Of course, if you don't want to route you can always --rejecthltcs for existing public channels, or setup private channels.

We're all still figuring it out, and force closes will happen, but it's really about choosing the tradeoffs that work for you at the end of the day. IMO, it can be quite fun trying to figure out the best way to setup a routing node, but if you don't want that exposure, you also have that option.

manreo commented 3 years ago

Hi @carlaKC thank you for the detailed explanation. Not sure it fits my situation exactly because:

  1. My node is reachable 24/7 in a dedicated Rock64 connected with a cable (BOS score top 5% of the network, RCN ISP)
  2. The peers that I got the force close with are well known and stable nodes (fixedfloat, LNBIG.com [lnd-13], LN.SIDESHIFT.AI, zapread.com).

And most importantly, I changed the logging to debug, so you can see that I have ping/pong to those nodes. So I don't understand exactly how your explanation fits the situation.

Thank you

carlaKC commented 3 years ago

@MrManPew, I was more trying to answer the question "how to minimize this from occurring" above, so my suggestions are more general advice (which I thought may be userful to other people who find this issue) than a specific investigation of your force close. Should have made that clearer!

mrfelton commented 3 years ago

We just had another channel from OpenNode force closed. Here are some relevant logs:

2020-10-19 08:42:16.094 [INF] CNCT: ChannelArbitrator(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): trigger userTrigger moving from state StateCommitmentBroadcasted to StateCommitmentBroadcasted
2020-10-19 08:42:16.094 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): Restoring 0 local updates that the peer should sign
2020-10-19 08:42:16.093 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): Restoring 0 dangling remote updates
2020-10-19 08:42:16.091 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): starting remote commitment: (*lnwallet.commitment)(0xc006ee2300)({
2020-10-19 08:42:16.089 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): starting local commitment: (*lnwallet.commitment)(0xc006ee2180)({
2020-10-19 08:42:16.053 [INF] LNWL: Inserting unconfirmed transaction 92226e338c4caf7592497dc5144adff32133479c1a35be674a6e22fc2aac0d67
2020-10-19 08:42:16.052 [INF] CNCT: Broadcasting force close transaction 92226e338c4caf7592497dc5144adff32133479c1a35be674a6e22fc2aac0d67, ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): (*wire.MsgTx)(0xc0060d5600)({
2020-10-19 08:42:16.046 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): Restoring 0 local updates that the peer should sign
2020-10-19 08:42:16.045 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): Restoring 0 dangling remote updates
2020-10-19 08:42:16.043 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): starting remote commitment: (*lnwallet.commitment)(0xc004f68840)({
2020-10-19 08:42:16.040 [DBG] LNWL: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): starting local commitment: (*lnwallet.commitment)(0xc004f686c0)({
2020-10-19 08:42:16.038 [INF] HSWC: Removing channel link with ChannelID(1f86d2b734d90799b952efd8e7ccaf4f6a0294086f66a70d9e95303da914be8a)
2020-10-19 08:42:16.035 [INF] CNCT: ChannelArbitrator(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): force closing chan
2020-10-19 08:42:16.024 [DBG] CNCT: ChannelArbitrator(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): checking commit chain actions at height=653399, in_htlc_count=0, out_htlc_count=1
2020-10-19 08:42:16.023 [DBG] CNCT: ChannelArbitrator(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): new block (height=653399) examining active HTLC's
2020-10-19 08:42:16.019 [INF] CNCT: Attempting to force close ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0)
2020-10-19 08:42:16.019 [WRN] PEER: Force closing link(649686:2138:0)
2020-10-19 08:42:16.019 [INF] HSWC: ChannelLink(649686:2138:0): stopping
2020-10-19 08:42:16.019 [INF] HSWC: Removing channel link with ChannelID(1f86d2b734d90799b952efd8e7ccaf4f6a0294086f66a70d9e95303da914be8a)
2020-10-19 08:42:16.018 [INF] HSWC: ChannelLink(649686:2138:0): exited
2020-10-19 08:42:16.018 [ERR] HSWC: ChannelLink(649686:2138:0): link failed, exiting htlcManager
2020-10-19 08:42:16.018 [ERR] HSWC: ChannelLink(649686:2138:0): failing link: ChannelPoint(8abe14a93d30959e0da7666f0894026a4fafcce7d8ef52b99907d934b7d2861f:0): unable to accept new commitment: rejected commitment: commit_height=6668, invalid_htlc_sig=30440220528973a546e23b5a8804f8a2d56de89507d077d588b61676079e74d079be3f160220301ca20341679828462c289a1106badd6dcdac4e0892cd387501478c62524d77, commit_tx=02000000011f86d2b734d90799b952efd8e7ccaf4f6a0294086f66a70d9e95303da914be8a0000000000e5b888800435fc030000000000160014c451993f9df7450dcfa030c8d32e06759662d322239e2100000000002200203aaf3923de92118bc563d71f2026d59abd619837388f3c7083235e22199607faafdeab000000000022002055aa9714863a1bb2b135a7e8b1c5e4e1a36424ba3c3c7d2da944cf84a43271a8afdeab000000000022002055aa9714863a1bb2b135a7e8b1c5e4e1a36424ba3c3c7d2da944cf84a43271a861b1cc20, sig_hash=c52bf9f265e4a90b9564d4ec4ec04b06b46880dc8ebf691d785eed85ca82a677 with error: invalid commitment
2020-10-19 08:42:15.958 [DBG] HSWC: ChannelLink(649686:2138:0): removing Add packet (Chan ID=650048:2105:1, HTLC ID=1042) from mailbox
2020-10-19 08:42:15.898 [DBG] HSWC: ChannelLink(649686:2138:0): queueing keystone of ADD open circuit: (Chan ID=650048:2105:1, HTLC ID=1042)->(Chan ID=649686:2138:0, HTLC ID=1387)
2020-10-19 08:42:15.886 [DBG] HSWC: ChannelLink(650048:2105:1): forwarding 1 packets to switch
2020-10-19 08:42:15.122 [DBG] HSWC: Sent 0 satoshis and received 0 satoshis in the last 10 seconds (2.600000 tx/sec)
2020-10-19 08:42:14.477 [DBG] HSWC: ChannelLink(650048:2105:1): removing Fail/Settle packet (Chan ID=650048:2105:1, HTLC ID=1041) from mailbox
2020-10-19 08:42:14.460 [DBG] HSWC: ChannelLink(650048:2105:1): queueing removal of FAIL closed circuit: (Chan ID=650048:2105:1, HTLC ID=1041)->(Chan ID=652102:572:0, HTLC ID=2482)

Any insight in the the specific reason this was force closed? Anything else I can provide to help?

Crypt-iQ commented 3 years ago

Any insight in the the specific reason this was force closed? Anything else I can provide to help?

Have you been noticing any connection issues - your logs, unlike some of the other ones, actually have an invalid commitment log line. There is one remaining class of force close that we don't yet handle - it's tracked on the spec level (https://github.com/lightningnetwork/lightning-rfc/issues/794) and that could be what you're experiencing.

mrfelton commented 3 years ago

What kind of connection issues @Crypt-iQ ? Our nodes are connected 24/7. I have noticed that some of the nodes that we are connected to randomly show as offline when I know that they are not actually offline (since I can confirm that our other nodes are still connected to the peer that shows as offline, and if I issue a lncli disconnect/connect it reestablishes the connection to the peer and shows as online again.

Crypt-iQ commented 3 years ago

Did you notice this after issuing a disconnect/connect command?

mrfelton commented 3 years ago

No, and the node that I had to disconnect/reconnect was a different one (Bitrefill). We have several channels with OpenNode and have not noticed any specific issues with them prior to this force closure.

Crypt-iQ commented 3 years ago

No, and the node that I had to disconnect/reconnect was a different one (Bitrefill). We have several channels with OpenNode and have not noticed any specific issues with them prior to this force closure.

Do you know what version OpenNode is running?

mrfelton commented 3 years ago

Apparently they just updated from 0.7.x to to v0.11.x yesterday @Crypt-iQ

Crypt-iQ commented 3 years ago

Apparently they just updated from 0.7.x to to v0.11.x yesterday @Crypt-iQ

Do you know (from your logs maybe?) whether the force close occurred before or after the upgrade?