lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.69k stars 2.08k forks source link

Channels stuck waiting closing/after automated recovery (also on Umbrel) #7974

Closed jimbrend closed 1 year ago

jimbrend commented 1 year ago

Background

I am helping an Umbrel user they used the automated recovery with just their 24 words, channels close and return onchain. I would like to reduce friction on this, often a couple channels will be stuck waiting close,

I would like to confirm the best steps to resolve, these issues stem it seems from zombie channels or other issues...

Your environment

Steps to reproduce

One channel is stuck, via pending channels:

{
    "total_limbo_balance": "0",
    "pending_open_channels": [
    ],
    "pending_closing_channels": [
    ],
    "pending_force_closing_channels": [
    ],
    "waiting_close_channels": [
        {
            "channel": {
                "remote_node_pub": "02612b51b61a90e19ec4c6b0f3409e58175efe66cbf665df7805cafc84aee3a551",
                "channel_point": "ebcb2ea6f69e82a207f55c87dc3869cdd4d2ea5b81f41e4c0d332ef4a741343c:0",
                "capacity": "2263674",
                "local_balance": "0",
                "remote_balance": "0",
                "local_chan_reserve_sat": "0",
                "remote_chan_reserve_sat": "0",
                "initiator": "INITIATOR_REMOTE",
                "commitment_type": "STATIC_REMOTE_KEY",
                "num_forwarding_packages": "0",
                "chan_status_flags": "ChanStatusRestored",
                "private": true
            },
            "limbo_balance": "0",
            "commitments": {
                "local_txid": "",
                "remote_txid": "",
                "remote_pending_txid": "",
                "local_commit_fee_sat": "0",
                "remote_commit_fee_sat": "0",
                "remote_pending_commit_fee_sat": "0"
            },
            "closing_txid": ""
        }
    ]
}

On Umbrel we prepend lncli like this (I have a short guide here any feedback appreciated)

Result of force close command:

~/umbrel/scripts/app compose lightning exec lnd lncli closechannel --force ebcb2ea6f69e82a207f55c87dc3869cdd4d2ea5b81f41e4c0d332ef4a741343c 0

This is the error: [lncli] rpc error: code = Unknown desc = cannot close channel with state: ChanStatusRestored

The lightning node app is no longer running on another instance, but I guess it was at anytime it causes this?

User can't seem to locate the other noderunner: https://1ml.com/node/02612b51b61a90e19ec4c6b0f3409e58175efe66cbf665df7805cafc84aee3a551 To contact them to request a cooperative close

How to resolve?

Must we wait ~2 weeks or is there another recommendation on these?

This happens often for the last couple channels, the user usually only has one storage drive on a Pi 4, and channel backup is abstracted and automated recovery now closes channels with just the 24 words... the latest channel backup is downloaded ~ so user can then wait or use any other method to recover?

I'm aware of chantools (many users have difficulty with this successfully, I believe chantools must be installed natively - I'm trying to figure this out on Umbrel but we want to hopefully avoid asking a user to go through a chantools installation if we can abstract that too if necessary), but I believe only necessary for zombie channels (or if there is another best practice on this?)

This current peer looks to be online but not sure what else to do?

I'd like to ping @guggero for visibility, I'd like to really confirm the best recommendation in this instance to make this easy for Umbrel users too, if you have any input it would be appreciated! Thank you!

ViktorTigerstrom commented 1 year ago

Hi @usernameisJim!

I'm providing you with a preliminary answer until guggero has had the opportunity to answer. I'll ping him as soon as he has the possibility to answer.

If users have the SCB file for their node, they could attempt recover these channels through scb recover for the individual channel(s) with the single_backup option https://docs.lightning.engineering/lightning-network-tools/lnd/disaster-recovery#b-static-channel-backup. As the it's the other peer of channels that will force close the channel if the scb recovery is successful, the user's funds in the channel (the user who initiated the SCB recovery process) will be spendable immediately, given that no funds are in-flight. Unfortunately else I do believe that the user would need to use chantools for this case, if the user can't contact the other peer. Though be aware that the user have to sure that they're using the latest state of the channel if they're forceclosing with chantools, to not risk loosing of funds. Force closing through chantools will also lead to the user having to wait to access their funds as normal for the peer who has initiated the force close. The zombie recovery process with chantools requires the user to cooperate with the other peer, and should only be done if none of the peers have an uncorrupt state of the channel state.

guggero commented 1 year ago

Please read this section of our operational safety doc: https://github.com/lightningnetwork/lnd/blob/master/docs/safety.md#zombie-channels

Using the SCB file (channel.backup) to recover channel funds always relies on the cooperation of the remote peer. If that peer is no longer around, both nodes have a problem. You cannot force close a channel that was restored from an SCB file, as at that moment your node doesn't have the necessary information anymore to initiate a force close.

Using the SCB file should therefore be an absolute last resort for an emergency. Calling it "automated recovery" and giving the user the impression they can just "uninstall and re-install" their Lightning node upon the smallest issue is extremely irresponsible in my opinion (I'm not sure what the current user interface and info/warning text currently looks like on Umbrel, I just want to re-iterate how clearly it should be communicated that this must be an absolute emergency option only).

That being said, it's weird that the specific node won't respond to the recovery request, as it seems to be online. Perhaps it just has a new IP address and attempting to connect to it manually (with an updated address from 1ml.com or amboss.space) might help. If they are a CLN node of a specific version, then the user could look into chantools triggerforceclose.

saubyk commented 1 year ago

Hi @usernameisJim assuming you have all the information you need from our end here. Closing this issue now. Feel free to reopen if anymore input is required.

ranathan14 commented 1 year ago

I have the same issue. I had to perform a recovery using SEED and channel.backup (SCB) because my home server unexpectedly shut down due to a blackout. These things can happen when you want to host a node in a decentralized context. The on-chain part was recovered without any issues. I then used SCB to close the only open channel I had. It appears in the list of channels 'waiting_to_close.' The counterparty node is currently active and has been added to my peers. However, the channel has been in this state for several days now. Considering that the procedure on my side is complete and that my counterpart (the other node with which I opened the channel) is currently active, what could be causing this delay?

lnshell@fd354ac543dc:~$ lncli pendingchannels { "total_limbo_balance": "0", "pending_open_channels": [ ], "pending_closing_channels": [ ], "pending_force_closing_channels": [ ], "waiting_close_channels": [ { "channel": { "remote_node_pub": "02a7792c657562b66bc87e5a2b98d32e8cd9fc16fab40cb67124472bf6b470beb8", "channel_point": "bef951a4a001770eb837f19474b1993e8c5fc881320ae7265cacf63cbde612cd:1", "capacity": "100000", "local_balance": "0", "remote_balance": "0", "local_chan_reserve_sat": "0", "remote_chan_reserve_sat": "0", "initiator": "INITIATOR_LOCAL", "commitment_type": "ANCHORS", "num_forwarding_packages": "0", "chan_status_flags": "ChanStatusLocalDataLoss|ChanStatusRestored" }, "limbo_balance": "0", "commitments": { "local_txid": "", "remote_txid": "", "remote_pending_txid": "", "local_commit_fee_sat": "0", "remote_commit_fee_sat": "0", "remote_pending_commit_fee_sat": "0" }, "closing_txid": "" } ] }

lnshell@fd354ac543dc:~$ lncli listpeers { "peers": [ { "pub_key": "02efe91c6fdd98f3da3dab5db81427eeb1076aba1216a7bc0cfcf5fef93f3133df", "address": "142.44.213.78:9735", "bytes_sent": "15058500", "bytes_recv": "8697515", "sat_sent": "0", "sat_recv": "0", "inbound": false, "ping_time": "300992", "sync_type": "ACTIVE_SYNC", "features": { "1": { "name": "data-loss-protect", "is_required": false, "is_known": true }, "5": { "name": "upfront-shutdown-script", "is_required": false, "is_known": true }, "7": { "name": "gossip-queries", "is_required": false, "is_known": true }, "8": { "name": "tlv-onion", "is_required": true, "is_known": true }, "11": { "name": "unknown", "is_required": false, "is_known": false }, "13": { "name": "static-remote-key", "is_required": false, "is_known": true }, "14": { "name": "payment-addr", "is_required": true, "is_known": true }, "17": { "name": "multi-path-payments", "is_required": false, "is_known": true }, "25": { "name": "unknown", "is_required": false, "is_known": false }, "27": { "name": "shutdown-any-segwit", "is_required": false, "is_known": true }, "29": { "name": "unknown", "is_required": false, "is_known": false }, "45": { "name": "explicit-commitment-type", "is_required": false, "is_known": true }, "47": { "name": "scid-alias", "is_required": false, "is_known": true }, "51": { "name": "zero-conf", "is_required": false, "is_known": true } }, "errors": [ ], "flap_count": 1, "last_flap_ns": "1695132229840382138", "last_ping_payload": null }, { "pub_key": "02281fa8f2e3547bd0cd039d803c80f2471d1a932ca73fb79661173a7c544b26f3", "address": "cvhhqkzs24h747io3a3ymtzzawljtbaferluvwiftsildrggohspylad.onion:9735", "bytes_sent": "1830519", "bytes_recv": "5950021", "sat_sent": "0", "sat_recv": "0", "inbound": false, "ping_time": "1111426", "sync_type": "PASSIVE_SYNC", "features": { "0": { "name": "data-loss-protect", "is_required": true, "is_known": true }, "5": { "name": "upfront-shutdown-script", "is_required": false, "is_known": true }, "7": { "name": "gossip-queries", "is_required": false, "is_known": true }, "9": { "name": "tlv-onion", "is_required": false, "is_known": true }, "12": { "name": "static-remote-key", "is_required": true, "is_known": true }, "14": { "name": "payment-addr", "is_required": true, "is_known": true }, "17": { "name": "multi-path-payments", "is_required": false, "is_known": true }, "19": { "name": "wumbo-channels", "is_required": false, "is_known": true }, "23": { "name": "anchors-zero-fee-htlc-tx", "is_required": false, "is_known": true }, "27": { "name": "shutdown-any-segwit", "is_required": false, "is_known": true }, "31": { "name": "amp", "is_required": false, "is_known": true }, "45": { "name": "explicit-commitment-type", "is_required": false, "is_known": true }, "2023": { "name": "script-enforced-lease", "is_required": false, "is_known": true } }, "errors": [ ], "flap_count": 1, "last_flap_ns": "1695200443267280750", "last_ping_payload": "00c06223accda4dde8ffa99bf3a05188cd9faa3a6b750fac956800000000000000000000df8f2c280870a0f2b73b4fb1c88afa13b1bc56d3a4499982ebc2d7b22f99152181c60a657fed04173f631f7b" }, { "pub_key": "025447ad3c6ce85a968aee4d7841286c018a49e190a1534d2e5a17aa38b9a531d7", "address": "e7zsybdyd67lbfajxn3xmyee6q32yzoyyirpy5k4hrupl2q4jux5xaqd.onion:9735", "bytes_sent": "54021", "bytes_recv": "3514962", "sat_sent": "0", "sat_recv": "0", "inbound": false, "ping_time": "1035297", "sync_type": "PASSIVE_SYNC", "features": { "0": { "name": "data-loss-protect", "is_required": true, "is_known": true }, "5": { "name": "upfront-shutdown-script", "is_required": false, "is_known": true }, "7": { "name": "gossip-queries", "is_required": false, "is_known": true }, "9": { "name": "tlv-onion", "is_required": false, "is_known": true }, "12": { "name": "static-remote-key", "is_required": true, "is_known": true }, "14": { "name": "payment-addr", "is_required": true, "is_known": true }, "17": { "name": "multi-path-payments", "is_required": false, "is_known": true }, "23": { "name": "anchors-zero-fee-htlc-tx", "is_required": false, "is_known": true }, "27": { "name": "shutdown-any-segwit", "is_required": false, "is_known": true }, "31": { "name": "amp", "is_required": false, "is_known": true }, "45": { "name": "explicit-commitment-type", "is_required": false, "is_known": true }, "2023": { "name": "script-enforced-lease", "is_required": false, "is_known": true } }, "errors": [ ], "flap_count": 1, "last_flap_ns": "1695200444719581004", "last_ping_payload": "00c06223accda4dde8ffa99bf3a05188cd9faa3a6b750fac956800000000000000000000df8f2c280870a0f2b73b4fb1c88afa13b1bc56d3a4499982ebc2d7b22f99152181c60a657fed04173f631f7b" }, { "pub_key": "02a7792c657562b66bc87e5a2b98d32e8cd9fc16fab40cb67124472bf6b470beb8", "address": "dsgftsrroqhxl5oqvcvxpi7c2suuoainyiuv7mc6nvtblueqoe4zjtyd.onion:9735", "bytes_sent": "10015", "bytes_recv": "292011", "sat_sent": "0", "sat_recv": "0", "inbound": false, "ping_time": "1109314", "sync_type": "ACTIVE_SYNC", "features": { "0": { "name": "data-loss-protect", "is_required": true, "is_known": true }, "5": { "name": "upfront-shutdown-script", "is_required": false, "is_known": true }, "7": { "name": "gossip-queries", "is_required": false, "is_known": true }, "9": { "name": "tlv-onion", "is_required": false, "is_known": true }, "12": { "name": "static-remote-key", "is_required": true, "is_known": true }, "14": { "name": "payment-addr", "is_required": true, "is_known": true }, "17": { "name": "multi-path-payments", "is_required": false, "is_known": true }, "19": { "name": "wumbo-channels", "is_required": false, "is_known": true }, "23": { "name": "anchors-zero-fee-htlc-tx", "is_required": false, "is_known": true }, "27": { "name": "shutdown-any-segwit", "is_required": false, "is_known": true }, "31": { "name": "amp", "is_required": false, "is_known": true }, "45": { "name": "explicit-commitment-type", "is_required": false, "is_known": true }, "2023": { "name": "script-enforced-lease", "is_required": false, "is_known": true } }, "errors": [ ], "flap_count": 10, "last_flap_ns": "1695200438494349557", "last_ping_payload": "00c06223accda4dde8ffa99bf3a05188cd9faa3a6b750fac956800000000000000000000df8f2c280870a0f2b73b4fb1c88afa13b1bc56d3a4499982ebc2d7b22f99152181c60a657fed04173f631f7b" }, { "pub_key": "02f6b4fe1c39ca488c6e227a709bd50e4c771b2be8c6e9f968fc0a66b712d7c720", "address": "qdhbbi7a4hoyjkppoanpm7z3dzz5waa24doyd5km4r7aancb4uqni4qd.onion:9735", "bytes_sent": "9934581", "bytes_recv": "13884004", "sat_sent": "0", "sat_recv": "0", "inbound": false, "ping_time": "515060", "sync_type": "ACTIVE_SYNC", "features": { "0": { "name": "data-loss-protect", "is_required": true, "is_known": true }, "5": { "name": "upfront-shutdown-script", "is_required": false, "is_known": true }, "7": { "name": "gossip-queries", "is_required": false, "is_known": true }, "9": { "name": "tlv-onion", "is_required": false, "is_known": true }, "12": { "name": "static-remote-key", "is_required": true, "is_known": true }, "14": { "name": "payment-addr", "is_required": true, "is_known": true }, "17": { "name": "multi-path-payments", "is_required": false, "is_known": true }, "23": { "name": "anchors-zero-fee-htlc-tx", "is_required": false, "is_known": true }, "27": { "name": "shutdown-any-segwit", "is_required": false, "is_known": true }, "31": { "name": "amp", "is_required": false, "is_known": true }, "45": { "name": "explicit-commitment-type", "is_required": false, "is_known": true }, "2023": { "name": "script-enforced-lease", "is_required": false, "is_known": true } }, "errors": [ ], "flap_count": 1, "last_flap_ns": "1695132193577381584", "last_ping_payload": "00c06223accda4dde8ffa99bf3a05188cd9faa3a6b750fac956800000000000000000000df8f2c280870a0f2b73b4fb1c88afa13b1bc56d3a4499982ebc2d7b22f99152181c60a657fed04173f631f7b" } ] }

ranathan14 commented 1 year ago

Hi @usernameisJim assuming you have all the information you need from our end here. Closing this issue now. Feel free to reopen if anymore input is required.

I don't believe that this issue should be closed.

guggero commented 1 year ago

@ranathan14 your channel is currently closing, it is just waiting for the close transaction to confirm: https://mempool.space/tx/cd6f8eebdf910badcdc68e9a9239d19b5892936c5ffc5e9d2e4751f092c97da9 With the current mempool congestion, things going on chain can just take a while.

ranathan14 commented 1 year ago

@ranathan14 your channel is currently closing, it is just waiting for the close transaction to confirm: https://mempool.space/tx/cd6f8eebdf910badcdc68e9a9239d19b5892936c5ffc5e9d2e4751f092c97da9 With the current mempool congestion, things going on chain can just take a while.

Thank you very much. Now it's clearer to me. One question: How did you obtain the transactionID of the transaction in the mempool? Very kind of you!!! P.S. This recovery goes way back as the original backup file also contained a backup of another channel that was never opened due to very low fees, which was causing the recovery process to be interrupted. I had to filter it out using chantools with the filter function. Real hands-on experience :)

guggero commented 1 year ago

The first part of the "channel point" (the string before the : character) is the channel funding transaction. If you enter that into a block explorer, you can then see if it's being spent and by what TX.

These things can happen when you want to host a node in a decentralized context.

Yes, I agree. But in a decentralized context your node is the only node that has all the data needed to safely recover the funds in a channel. When you run the "recover" process on Umbrel, you actually delete that crucial data and then make yourself dependent on your peers (which is what the SCB file does). I assume you did not look into how to fix any errors that resulted from the outage? Because that should've been the highest priority (which is what my comment above is all about).

ranathan14 commented 1 year ago

The first part of the "channel point" (the string before the : character) is the channel funding transaction. If you enter that into a block explorer, you can then see if it's being spent and by what TX.

These things can happen when you want to host a node in a decentralized context.

Yes, I agree. But in a decentralized context your node is the only node that has all the data needed to safely recover the funds in a channel. When you run the "recover" process on Umbrel, you actually delete that crucial data and then make yourself dependent on your peers (which is what the SCB file does). I assume you did not look into how to fix any errors that resulted from the outage? Because that should've been the highest priority (which is what my comment above is all about).

I'll be honest that I haven't conducted much analysis regarding the issue that occurred with the blackout. However, I also wanted to better understand how the restoration from a previous backup was happening. In fact, it was all done with small amounts that don't significantly impact one's life. Nevertheless, the fact that we are not entirely independent in recovering balances (the other node must be operational and not turned off) makes me imagine that this second layer may not be suitable for large amounts. I wouldn't invest large sums in something I'm not certain can recover unless the other node belongs to someone I know. I also don't think this aspect is widely known. That's my thought, and I'm not a detractor of Bitcoin. On the contrary ;) Many thanks for the support and for explaining everything to me better