lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.65k stars 2.07k forks source link

gossip: node announcement ignored because node not in graph #8870

Open guggero opened 3 months ago

guggero commented 3 months ago

Background

While running integration tests in the 0-19-staging branch of litd, I noticed that sometimes we don't have a node announcement for a direct channel peer.

I tracked it down to these log lines:

2024-06-26 14:22:35.525 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received NodeAnnouncement(node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702
2024-06-26 14:22:35.525 [TRC] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): readMessage from peer: (*lnwire.NodeAnnouncement)(0xc0036a0000)({
 Signature: (lnwire.Sig) {
  bytes: ([64]uint8) (len=64 cap=64) {
   00000000  1c c7 06 12 24 a9 aa 3e  9b 0e e7 7b 52 16 6e 51  |....$..>...{R.nQ|
   00000010  51 ad 22 51 f3 32 6c 0b  fd 52 c0 6a e4 e5 02 d3  |Q."Q.2l..R.j....|
   00000020  18 30 68 3b a0 ce b6 f8  d9 90 1c 92 92 d7 b5 27  |.0h;...........'|
   00000030  d1 f5 98 61 03 32 91 90  97 73 8c 06 e4 df d0 da  |...a.2...s......|
  },
...
<snip>
...

 }
})
...
2024-06-26 14:22:35.525 [TRC] DISC: Received network message: peer=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702, msg=NodeAnnouncement, is_remote=true
2024-06-26 14:22:35.525 [DBG] DISC: Processing NodeAnnouncement: peer=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702, timestamp=2024-06-26 14:22:35 +0200 CEST, node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a
2024-06-26 14:22:35.525 [DBG] CRTR: Checking stale node 303233333365316439313662376433386231613932306565613136363263386530373233656235663236303462643531643331666162396139376464613237373561 got Ignoring node announcement for node not found in channel graph (02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a)
2024-06-26 14:22:35.525 [DBG] DISC: Skipped processing stale node: 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a
2024-06-26 14:22:35.525 [TRC] DISC: Processed network message NodeAnnouncement, returned len(announcements)=0, allowDependents=true

This is after we've opened an unannounced channel to that node. And it doesn't happen all the time, so it must be a timing issue.

It's pretty weird that we apparently haven't seen this issue in vanilla lnd yet, or at least it doesn't seem like this happens in the itests there. Or maybe this is an existing flake that we just haven't noticed yet?

Here's the full log (litd in integrated mode with custom channel logic): 3-test_custom_channels_force_close-Charlie-03df097b.log

guggero commented 3 months ago

I've created this naive fix which solves the issue in the integration test: https://github.com/lightningnetwork/lnd/pull/8871 But it sounds like it's mostly fighting a symptom instead of fixing the root cause. But I'm not super familiar with the timings and dependency management of gossip stuff, so looking for other opinions.

cc @ellemouton, @Crypt-iQ

yyforyongyu commented 3 months ago

what are these services?

2024-06-26 14:22:35.524 [TRC] TCHN: Request to handle: *lnwire.ChannelUpdate
2024-06-26 14:22:35.524 [TRC] TCHN: Request to handle: 258
2024-06-26 14:22:35.524 [DBG] TCHN: Failed to handle: *lnwire.ChannelUpdate
2024-06-26 14:22:35.524 [DBG] PFSM: MsgRouter: unable to route msg protofsm.PeerMsg
yyforyongyu commented 3 months ago

Did some analysis on the logs, here's what happened.

Suppose Alice opens a channel with Bob, the normal flow, from Alice's side,

  1. received channel_ready, handle it in handleChannelReadyReceived, that calls addToRouterGraph.
  2. send channel_announcement to herself -> creates the node and channel data in db.
  3. send channel_update to herself -> creates the edge in router and,
  4. sends channel_update to Bob.
  5. sends node_announcement to Bob.

For Bob he would do the same. However, there's no guarantee the following won't happen,

  1. Bob receives channel_ready, finishes the sending of channel_announcement and channel_update to himself, and sends channel_update and node_announcement to Alice.
  2. Alice receives Bob's channel_update while still in the process of handling channel_ready, in other words, there's no node info in her db.
  3. Alice would cache this channel_update to be processed later, as fixed in #6054.

which is the case shown in the logs,

===> Alice received Bob's ChannelUpdate
2024-06-26 14:22:35.524 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=509073883725824, mflags=00000001, cflags=00000000, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702

===> Alice received Bob's NodeAnnouncement
2024-06-26 14:22:35.525 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received NodeAnnouncement(node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702

===> Alice received her own ChannelAnnouncement
2024-06-26 14:22:35.529 [DBG] DISC: Processing ChannelAnnouncement: peer=<nil>, short_chan_id=509073883725824

===> Alice received her own ChannelUpdate
2024-06-26 14:22:35.531 [DBG] DISC: Processing ChannelUpdate: peer=<nil>, short_chan_id=509073883725824, 

Alice should have done the same for node_announcement, which I though it was fixed in #7264 but apparently not.

So to properly fix it, we can,