lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.63k stars 2.07k forks source link

[bug]: "edge not found" for local channel #7261

Open C-Otto opened 1 year ago

C-Otto commented 1 year ago

Background

A peer opened a channel to my node. This channel is listed in lncli listchannels. However, lncli getchaninfo X shows an error (with X being the channel ID):

[lncli] rpc error: code = Unknown desc = edge not found

According to lncli listchannels, this appears to be a regular (inactive) channel. It's not private, it has "remote_constraints", there's an ID in "alias_scids". It's not a zero-conf channel. Strangely, there are two incoming HTLCs already, where "forwarding_channel" is set to 0. Currently, the channel is inactive, but it has been active before.

When reconnecting to the peer, I get:

[INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
[WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values

In my logs I also see:

[INF] RPCS: channel: xxx:1 not found by channel event store

I'd be happy to provide further information.

Your environment

C-Otto commented 1 year ago

Something weird happened during channel open. It looks like the second open attempt worked?

[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Cancelling funding reservation for node_key=xxx, chan_id=xxx
[INF] PEER: Peer(xxx): unable to read message from peer: EOF
[ERR] FNDG: received funding error from xxx: chan_id=xxx, err=Channel open canceled by us

a few seconds later:
[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Sending fundingResp for pending_id(xxx)
[INF] FNDG: completing pending_id(xxx) with ChannelPoint(xxx:1)
[INF] FNDG: sending FundingSigned for pending_id(xxx) over ChannelPoint(xxx:1)

The open transaction is confirmed and, according to mempool.space, opened two other channels to other peers.

C-Otto commented 1 year ago

I also get this:

[WRN] HSWC: ChannelLink(xxx:1): received warning message from peer: chan_id=0000000000000000000000000000000000000000000000000000000000000000, err=Malformed node_announcement [...]

The node_announcement is extremely long, filling roughly one screen.

yyforyongyu commented 1 year ago

Discussed offline, will post more details here after proper investigation.

Crypt-iQ commented 1 year ago

I think this is distinct from https://github.com/lightningnetwork/lnd/issues/7229 since this was not a zero-conf channel

dannydeezy commented 1 year ago

i get this issue too for a few channels

guggero commented 1 year ago

@dannydeezy with what version of lnd? @C-Otto does this still happen with lnd 0.16.1?

C-Otto commented 1 year ago

I'm running lnd 0.16.0 with some WT related commits, not 0.16.1. Currently, I don't see this issue. The last time the error message appeared in my logs was 2023-04-09. I updated to lnd 0.16.0 a few days before, so I think this is not resolved, yet.

litch commented 1 year ago

We're running into this also, on nodes running 0.16.0

ellemouton commented 1 year ago

For those who recently experienced this bug, pls can you try updated to 0.16.1-beta.rc1 to see if that fixes your issue?

It might be that your issue is the same as this user's issue which is distinct from this issue even though it throws the same error message. That user's issue was fixed in 0.16.1-beta.rc1

theborakompanioni commented 1 year ago

Same here after upgrading to 0.16.1-beta.rc3:

[INF] DISC: Creating new GossipSyncer for peer=$peer_id
[ERR] NANN: Unable to retrieve chan status for Channel($tx_id:1): edge not found
[INF] DISC: GossipSyncer($peer_id): applying new update horizon: start=2106-02-07 03:11:07 +1100 LHDT, end=2242-03-16 13:56:30 +1100 LHDT, backlog_size=0
[INF] PEER: Peer($peer_id): unable to read message from peer: EOF
[INF] PEER: Peer($peer_id): disconnecting $peer_id@$address, reason: read handler closed
[INF] NTFN: Cancelling epoch notification, epoch_id=39
[INF] DISC: Removing GossipSyncer for peer=$peer_id
[INF] HSWC: ChannelLink($tx_id:1): stopping
[WRN] HSWC: ChannelLink($tx_id:1): error when syncing channel states: link shutting down
[INF] HSWC: ChannelLink($tx_id:1): exited
[INF] HSWC: Removing channel link with ChannelID(x)

Edit: After restarting only LND node the problem did not go away. However, after restarting the other node (CLN) everything started to work as expected.

ellemouton commented 1 year ago

Thanks for the info @theborakompanioni! Question to everyone else experiencing this issue - is your channel peer for the problem channel a CLN node?

litch commented 1 year ago

No, both sides of the channel were LND in our case (we saw this a few times)

SunnySarahNode commented 1 year ago

I have same case with one of my peers.

My node: LND 0.16.0-beta commit=v0.16.0-beta Peer node: LND 0.16.0

We both had "[lncli] rpc error: code = Unknown desc = edge not found"

(!) The issue was resolved on my peer's side by upgrading LND 0.16.0 -> 0.16.1-beta. Immediately after that the problem disappeared and the channel became active.

On my side the issue is currently active, obviously I need to upgrade my LND to 0.16.1 too, but I haven't done it yet.

Hope this information is helpful.

Addition logs from my side:


2023-04-22 12:42:57.884 [WRN] NANN: Unable to find channel policies for xxx:1, skipping. This is typical if the channel is in the process of closing.
...
2023-04-22 12:42:58.114 [INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
2023-04-22 12:42:58.114 [WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values
...
2023-04-22 12:42:58.344 [INF] HSWC: ChannelLink(1863a7ca2bb10229e44b158223aa04e0e3a185b056f55263e3e5b0d4291a5235:1): starting
...
2023-04-22 12:42:58.358 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(xxx:1)
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): HTLC manager started, bandwidth=0 mSAT
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): attempting to re-synchronize
2023-04-22 12:42:58.358 [WRN] HSWC: ChannelLink(xxx:1): error when syncing channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting
2023-04-22 12:42:58.359 [ERR] HSWC: ChannelLink(xxx:1): failing link: unable to synchronize channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting with error: unable to resume channel, recovery required
2023-04-22 12:42:58.359 [ERR] FNDG: Unable to advance state(xxx:1): failed adding to router graph: error generating channel announcement: unable to generate channel update announcement: channel not found
2023-04-22 12:42:58.359 [INF] HSWC: ChannelLink(xxx:1): exited
ellemouton commented 1 year ago

hi @SunnySarahNode - i think your specific case was solved in 0.16.1-beta (in https://github.com/lightningnetwork/lnd/pull/7613) . Pls can you update & see if that works? your peer managing to solve the problem on their side by upgrading makes me think it is the same as this user: https://github.com/lightningnetwork/lnd/issues/6987#issuecomment-1510377248

saubyk commented 1 year ago

Closing this issue, as the fix has been provided with 0.16.1. Feel free to reopen if the problem persists.

JssDWt commented 1 year ago

@saubyk The fix in 0.16.1 fixes this for new channels, but does not retroactively fix the old ones. This is still an issue if you ever ended up in the situation where you ended up in this situation. Note that it affects forwarding of htlcs as well, because the channel policy cannot be found for the edge, which makes it use the default channel policy.

miephos commented 6 months ago

Hello, I have the same issue. I recently opened a channel to a peer, but I get the same error mentioned. I restarted LND, disconnected and connected again to the peer. Didn't help.

If I run, I get the following error

lncli getchaninfo 9133577121506xxxxx [lncli] rpc error: code = Unknown desc = edge not found

Version: 0.17.4-beta

I have attached the log trace of the opening transaction till I received the error. boltz-open-channel.log

Maybe you can make some sense out of it. Would be nice to be able to use the channel.

Big thanks!!

yyforyongyu commented 3 months ago

From the logs, the policy was not found even in lnd v0.17.4,

2024-02-16 08:51:13.826 [WRN] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): Unable to find our forwarding policy for channel c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1, using default values

which is originated here when the edge cannot be found: https://github.com/lightningnetwork/lnd/blob/98c52df4d2378512b721f4e1140efc5ac0ef5904/peer/brontide.go#L1024-L1026

These four lines are also weird,

2024-02-16 08:51:37.856 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:52:37.843 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:53:37.842 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:54:37.841 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found

As we'd expect markPendingInactiveChannels to be called every 30s (default) here, but it's called 4 times instead, https://github.com/lightningnetwork/lnd/blob/98c52df4d2378512b721f4e1140efc5ac0ef5904/netann/chan_status_manager.go#L353-L359

There's also a disconnect/connect attempt here, I assume it's triggered manually?

2024-02-16 08:51:12.148 [INF] SRVR: Disconnecting from 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): disconnecting 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736, reason: server: DisconnectPeer called
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): unable to read message from peer: read next header: read tcp 10.10.21.21:55308->45.86.229.190:9736: use of closed network connection
2024-02-16 08:51:12.157 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.158 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.248 [INF] DISC: Removing GossipSyncer for peer=02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

miephos commented 2 months ago

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

The channel was a public channel. I did a manual disconnect and connect to peer. It still seemed stuck. So I force closed the the channel.

I still have the logs. Let me know how to provide them, if needed.

yyforyongyu commented 2 months ago

@miephos it'd be great to provide some logs! You can find me via the handle yyforyongyu on slack/twitter/keybase, thanks!

ziggie1984 commented 2 months ago

Yeah really strange behaviour would it make sense to bring this log to the Error level, so that we at least know when the edge failed to be added to the graph, otherwise there are some situations where we basically shadow the err which makes it really hard to tell what happened if node runners don't run the gossiper in the debug level ?

https://github.com/lightningnetwork/lnd/blob/71ba355d903f01990718866c358c2ccac9402891/discovery/gossiper.go#L2532-L2533