lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.64k stars 2.08k forks source link

[bug]: pong response failure #9043

Open AndySchroder opened 3 weeks ago

AndySchroder commented 3 weeks ago

Background

peers don't stay connected

Your environment

lnd-v0.18.2

Expected behaviour

peers should stay connected and channels remain active.

Actual behaviour

I have two nodes on a local network, Node A and Node B. Node A has port 9735 open on the firewall. Node B has no open firewall ports. Restarting node B causes it to connect to Node A, but then after a few minutes, I get the following errors and channels go inactive. I just upgraded from v0.16.4-beta.rc1 to lnd-v0.18.2. I believe that it worked fine on v0.16.4-beta.rc1 .

Node A

2024-08-28 13:42:07.420 [WRN] PEER: Peer(B): pong response failure for B@192.168.2.B:57402: timeout while waiting for pong response -- disconnecting 2024-08-28 13:42:07.420 [INF] PEER: Peer(B): disconnecting B@192.168.2.B:57402, reason: pong response failure for B@192.168.2.B:57402: timeout while waiting for pong response -- disconnecting 2024-08-28 13:42:07.420 [INF] PEER: Peer(B): unable to read message from peer: read next header: read tcp 192.168.2.A:9735->192.168.2.B:57402: use of closed network connection 2024-08-28 13:42:07.521 [INF] DISC: Removing GossipSyncer for peer=B 2024-08-28 13:42:07.521 [INF] HSWC: ChannelLink(thechannel:1): stopping 2024-08-28 13:42:07.522 [INF] HSWC: ChannelLink(thechannel:1): exited 2024-08-28 13:42:07.522 [INF] HSWC: Removing channel link with ChannelID(thechannelid)

Node B

2024-08-28 13:42:07.435 [WRN] PEER: Peer(A): pong response failure for A@192.168.2.A:9735: timeout while waiting for pong response -- disconnecting 2024-08-28 13:42:07.436 [INF] PEER: Peer(A): disconnecting A@192.168.2.A:9735, reason: pong response failure for A@192.168.2.A:9735: timeout while waiting for pong response -- disconnecting 2024-08-28 13:42:07.538 [INF] DISC: Removing GossipSyncer for peer=A 2024-08-28 13:42:07.539 [INF] HSWC: ChannelLink(thechannel:1): stopping 2024-08-28 13:42:07.540 [INF] HSWC: ChannelLink(thechannel:1): exited 2024-08-28 13:42:07.541 [INF] HSWC: Removing channel link with ChannelID(thechannelid)

AndySchroder commented 3 weeks ago

Also, I have another node C. This node is on the same physical machine as node B. Note A and C can communicate together. Wondering if node A is getting confused between node B and C since they have the same IP address? Node C has port 9735 opened on the firewall.

Also, I have another node D. This is on the same physical machine as A. Node D can stay connected to node A.

Both node B and D have nolisten=true set in lnd.conf.

ViktorTigerstrom commented 2 weeks ago

Hi @AndySchroder,

Could you please check the following just for some initial clarifications:

  1. Does the connection remain up if you remove: nolisten=true On node B?

  2. Alternatively does the connection remain up if you keep the nolisten=true on node B, but never start Node C?

ziggie1984 commented 2 weeks ago

I just upgraded from v0.16.4-beta.rc1 to lnd-v0.18.2. I believe that it worked fine on v0.16.4-beta.rc1

Since LND 18 we do enforce pong messages and will disconnect the peer if the don't get a reply in 30sec. Something seems not right with the connection.

Can you set the PEER subsystem to trace and provide the logs.