lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.73k stars 2.09k forks source link

repeated inbound connections #1630

Closed robtex closed 6 years ago

robtex commented 6 years ago

Got a bunch of these from various nodes. It seems the connection is silently terminated, since it is reestablished. How to best find out why it was closed? Hard to tell even if it was my node or the other side that hung up.

2018-07-26 06:11:32.183 [INF] SRVR: finalizing connection to 024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1, inbound=true
2018-07-26 06:11:32.296 [INF] PEER: NodeKey(024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1) loading ChannelPoint(ff709eb6aaab6e8e45224f34638b91c6c53e5f4ccf24532118111ef0d2c2ea01:0)
2018-07-26 06:11:33.678 [INF] SRVR: New inbound connection from 85.248.227.164:55299
2018-07-26 06:11:33.680 [INF] SRVR: finalizing connection to 024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1, inbound=true
2018-07-26 06:11:33.686 [INF] PEER: NodeKey(024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1) loading ChannelPoint(ff709eb6aaab6e8e45224f34638b91c6c53e5f4ccf24532118111ef0d2c2ea01:0)
2018-07-26 06:11:34.903 [INF] SRVR: New inbound connection from 85.248.227.164:64817
2018-07-26 06:11:34.903 [INF] SRVR: finalizing connection to 024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1, inbound=true
2018-07-26 06:11:34.911 [INF] PEER: NodeKey(024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1) loading ChannelPoint(ff709eb6aaab6e8e45224f34638b91c6c53e5f4ccf24532118111ef0d2c2ea01:0)
Roasbeef commented 6 years ago

You can look to see what the error is on disconnect. If the peer is behaving poorly, then you should ban the ip for a bit. If it the remote side repeatedly connecting inbound, and not us connecting outbound, then we don't have a channel with them, so we don't really care about their connection in the first place.

cfromknecht commented 6 years ago

Does the error show up if SRVR and PEER are set to debug log level?

robtex commented 6 years ago

the peer mentioned in the log is 024509d6321a579b3f72117509e243d6cdd86874cb0731ff39be6c92f3e64b53a1 the channelpoint ff709eb6aaab6e8e45224f34638b91c6c53e5f4ccf24532118111ef0d2c2ea01:0 is a channel that exists between my node and the peer if it didn't exist, wouldn't my node write an error?

i don't know which of those lines you consider being "the error".

the log doesn't show me an error, or any reason for disconnect. it doesn't even say if it was connected by my node or the peer. or that it was disconnected at all. i just assume it is connected since the remote peer obviously connects again.

i currently have debuglevel=CNCT=trace

robtex commented 6 years ago

i managed to reproduce similar but with outgoing connections.

lncli connect 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053@38.87.54.163:9745

2018-07-27 23:20:38.014 [INF] SRVR: Established connection to: 38.87.54.163:9745
2018-07-27 23:20:38.014 [INF] SRVR: finalizing connection to 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053, inbound=false

lncli connect 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053@38.87.54.163:9745

2018-07-27 23:20:40.894 [INF] SRVR: Established connection to: 38.87.54.163:9745
2018-07-27 23:20:40.894 [INF] SRVR: finalizing connection to 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053, inbound=false

and nothing more.

cfromknecht commented 6 years ago

is a channel that exists between my node and the peer

yeah logs indicate that this is a proper peer

can you try that experminent after lncli debuglevel --level=PEER=debug,SRVR=debug?

The peer level disconnect is printed to trace: https://github.com/lightningnetwork/lnd/blob/master/peer.go#L580 so idt you'd see that error if the peer fails to start.

robtex commented 6 years ago

A bit wrong level to log disconnects imho. here is the above connect with requested loglevels http://paste.ubuntu.com/p/Zs6PSTDyKW/

robtex commented 6 years ago

this is a similar peer but with more info: lncli connect 02ad7e6c93e4e96d718e90be2274b4f493bcb446ceb2c2aad99a0dfc652e548db8@71.121.246.174:9735


2018-07-27 23:48:28.857 [WRN] SRVR: Already have 1 persistent connection requests for 02ad7e6c93e4e96d718e90be2274b4f493bcb446ceb2c2aad99a0dfc652e548db8@71.121.246.174:9735, connecting anyway.
2018-07-27 23:48:28.880 [INF] SRVR: Established connection to: 71.121.246.174:9735
2018-07-27 23:48:28.880 [INF] SRVR: finalizing connection to 02ad7e6c93e4e96d718e90be2274b4f493bcb446ceb2c2aad99a0dfc652e548db8, inbound=false
2018-07-27 23:48:28.899 [INF] PEER: NodeKey(02ad7e6c93e4e96d718e90be2274b4f493bcb446ceb2c2aad99a0dfc652e548db8) loading ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.920 [DBG] CNCT: New ChainEventSubscription(id=17) for ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.920 [INF] HSWC: ChannelLink(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1) is starting
2018-07-27 23:48:28.920 [INF] HSWC: Trimming open circuits for chan_id=525644:1078:1, start_htlc_id=0
2018-07-27 23:48:28.920 [INF] HSWC: Adding live link chan_id=a8209216f64a4b81721521f0a819ecf023853ab9112ab75658bb60be6f6ec394, short_chan_id=525644:1078:1
2018-07-27 23:48:28.920 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.920 [TRC] CNCT: ChannelArbitrator(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1) got new signal update!
2018-07-27 23:48:28.920 [INF] HSWC: HTLC manager for ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1) started, bandwidth=0 mSAT
2018-07-27 23:48:28.920 [INF] HSWC: Attempting to re-resynchronize ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.921 [INF] HSWC: Received re-establishment message from remote side for channel(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.932 [INF] HSWC: Sending 1 updates to synchronize the state for ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.980 [INF] PEER: unable to read message from 71.121.246.174:9735: EOF
2018-07-27 23:48:28.980 [INF] DISC: Removing gossipSyncer for peer=02ad7e6c93e4e96d718e90be2274b4f493bcb446ceb2c2aad99a0dfc652e548db8
2018-07-27 23:48:28.980 [INF] HSWC: Removing channel link with ChannelID(a8209216f64a4b81721521f0a819ecf023853ab9112ab75658bb60be6f6ec394)
2018-07-27 23:48:28.980 [INF] HSWC: ChannelLink(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1) is stopping
2018-07-27 23:48:29.080 [INF] HSWC: ChannelLink(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1) has exited```
cfromknecht commented 6 years ago

Yeah can probably bump that up that if it's useful, we only logs disconnects atm if the peer successfully starts

based on the first logs, it looks like the remote party isn't sending their InitMsg.

2018-07-27 23:36:03.491 [DBG] SRVR: Connecting to 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053@38.87.54.163:9745
2018-07-27 23:36:03.536 [DBG] PEER: Received ChannelUpdate(chain_hash=000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f, short_chan_id=578699358051565568, flag=1, update_time=2018-07-27 23:35:50 +0000 UTC) from 62.45.114.225:9735
2018-07-27 23:36:03.549 [INF] SRVR: Established connection to: 38.87.54.163:9745
2018-07-27 23:36:03.549 [INF] SRVR: finalizing connection to 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053, inbound=false
2018-07-27 23:36:03.549 [DBG] PEER: Sending Init to 38.87.54.163:9745
... nothing

not much we can do about that, do you know what impl is on the remote end?

if they don't send, we'll time out after 15s, and then log to the Tracef in peer.Disconnect https://github.com/lightningnetwork/lnd/blob/master/peer.go#L267

In the second log, it looks like it successfully starts. The first log only shows entries over 4-5 seconds, maybe the init was sent later?

cfromknecht commented 6 years ago
2018-07-27 23:48:28.932 [INF] HSWC: Sending 1 updates to synchronize the state for ChannelPoint(95c36e6fbe60bb5856b72a11b93a8523f0ec19a8f0211572814b4af6169220a8:1)
2018-07-27 23:48:28.980 [INF] PEER: unable to read message from 71.121.246.174:9735: EOF

Here it looks they may have not liked the updates you sent

robtex commented 6 years ago

regarding the 5-second log, it feels like enough time, i could instantly reconnect. but i'll make a new one just in case.

regarding we sent something the other node didn't like. yeah. it is highly unfortunate that there is no disconnect-message with reason in bolt on a higher level than just brutally tearing down the tcp. that definitely should have helped a lot of debugging.

this is from yet another unstable peer, where it seems my node hangs up because it doesn't like the update. at least i guess in this case it is we that hang up right after 2018-07-27 23:59:30.733 [ERR] PEER: recv'd update for unknown channel 06c9219d1040e59375c9ec8a7822f3f0152f01898466ae617fd6b4413b5431b6 from 212.51.132.226:9735

2018-07-27 23:59:00.182 [WRN] SRVR: Already have 1 persistent connection requests for 021779d56f6992431e959991a664d678a2f41054073adb48c5a351d791c9cb0781@212.51.132.226:9735, connecting anyway.
2018-07-27 23:59:00.375 [INF] SRVR: Established connection to: 212.51.132.226:9735                                                                                                                                           2018-07-27 23:59:00.375 [INF] SRVR: finalizing connection to 021779d56f6992431e959991a664d678a2f41054073adb48c5a351d791c9cb0781, inbound=false
2018-07-27 23:59:00.478 [INF] PEER: NodeKey(021779d56f6992431e959991a664d678a2f41054073adb48c5a351d791c9cb0781) loading ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1)                     2018-07-27 23:59:00.487 [DBG] CNCT: New ChainEventSubscription(id=21) for ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1)
2018-07-27 23:59:00.487 [INF] HSWC: ChannelLink(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1) is starting                                                                                              
2018-07-27 23:59:00.487 [INF] HSWC: Trimming open circuits for chan_id=523833:1125:1, start_htlc_id=0
2018-07-27 23:59:00.487 [INF] HSWC: Adding live link chan_id=0e610c61bf79ef8b8198a0810f17183328a14a5dd071c73e7c3017abeb579812, short_chan_id=523833:1125:1
2018-07-27 23:59:00.487 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1)
2018-07-27 23:59:00.487 [TRC] CNCT: ChannelArbitrator(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1) got new signal update!
2018-07-27 23:59:00.487 [INF] HSWC: HTLC manager for ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1) started, bandwidth=0 mSAT
2018-07-27 23:59:00.487 [INF] HSWC: Attempting to re-resynchronize ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1)
2018-07-27 23:59:00.727 [ERR] PEER: recv'd update for unknown channel 06c9219d1040e59375c9ec8a7822f3f0152f01898466ae617fd6b4413b5431b6 from 212.51.132.226:9735
2018-07-27 23:59:00.727 [INF] HSWC: ChannelPoint(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1): resending FundingLocked message to peer
2018-07-27 23:59:00.727 [INF] HSWC: Received re-establishment message from remote side for channel(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1)
2018-07-27 23:59:00.731 [ERR] PEER: recv'd update for unknown channel dc6743f7c994ceec2a97c314b97d1ce2e14de1f8a656d89d836ba2f8639597ec from 212.51.132.226:9735
2018-07-27 23:59:01.065 [ERR] CHDB: unable to read channel for chain_hash=6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000, node_key=035c0fe76eea9faf84d630cf749f3187e5b54b22c0e5a50079d4ccdd4a2b00c11b: unable to read channel data for chan_point=d0ce726c9c344ec61f34c6c51794cbd7c0e6cbfd82725fd3720f746ef28ddeeb:0: unable to fetch chan info: no chan info found
2018-07-27 23:59:01.528 [ERR] CHDB: unable to read channel for chain_hash=6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000, node_key=035c0fe76eea9faf84d630cf749f3187e5b54b22c0e5a50079d4ccdd4a2b00c11b: unable to read channel data for chan_point=d0ce726c9c344ec61f34c6c51794cbd7c0e6cbfd82725fd3720f746ef28ddeeb:0: unable to fetch chan info: no chan info found
2018-07-27 23:59:02.120 [ERR] CHDB: unable to read channel for chain_hash=6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000, node_key=035c0fe76eea9faf84d630cf749f3187e5b54b22c0e5a50079d4ccdd4a2b00c11b: unable to read channel data for chan_point=d0ce726c9c344ec61f34c6c51794cbd7c0e6cbfd82725fd3720f746ef28ddeeb:0: unable to fetch chan info: no chan info found
2018-07-27 23:59:02.265 [INF] FNDG: Received duplicate fundingLocked for ChannelID(0e610c61bf79ef8b8198a0810f17183328a14a5dd071c73e7c3017abeb579812), ignoring.
2018-07-27 23:59:10.093 [INF] DISC: Broadcasting batch of 6 new announcements
2018-07-27 23:59:30.733 [ERR] PEER: recv'd update for unknown channel 06c9219d1040e59375c9ec8a7822f3f0152f01898466ae617fd6b4413b5431b6 from 212.51.132.226:9735
2018-07-27 23:59:30.735 [INF] PEER: unable to read message from 212.51.132.226:9735: EOF
2018-07-27 23:59:30.735 [INF] DISC: Removing gossipSyncer for peer=021779d56f6992431e959991a664d678a2f41054073adb48c5a351d791c9cb0781
2018-07-27 23:59:30.735 [INF] HSWC: Removing channel link with ChannelID(0e610c61bf79ef8b8198a0810f17183328a14a5dd071c73e7c3017abeb579812)
2018-07-27 23:59:30.735 [INF] HSWC: ChannelLink(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1) is stopping
2018-07-27 23:59:30.835 [INF] HSWC: ChannelLink(139857ebab17307c3ec771d05d4aa1283318170f81a098818bef79bf610c610e:1) has exited
robtex commented 6 years ago

http://paste.ubuntu.com/p/fm9VjYnbDG/ two consecutive lncli connect 0354d8382003663d8d825954224f80dcbd740b5031c5a588277a20c75586d41053@38.87.54.163:9745 with lncli debuglevel --level=PEER=debug,SRVR=debug

Roasbeef commented 6 years ago

The issue looks to be the remote side disconnecting. For some reason, they think you still have a channel open with them. As a result, they disconnect as fail to send one of the channel handshakes. Closing this for now as I don't see an issue on the lnd side, more that the remote peer is being faulty.