lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.56k stars 2.06k forks source link

unconfirmed channels are not included in channel backup when using gRPC #3816

Closed rbrooklyn closed 4 years ago

rbrooklyn commented 4 years ago

Background

Unable to recover after a hardware failure

Your environment

Steps to reproduce and actual behaviour

Channel opened last night. channel.backup file taken and saved away from computer hosting LND. 24 word seed kept safe. The channel.backup file was taken before the channel was confirmed on-chain, but after I initiated the channel opening process.

At some point overnight, the hardware failed.

Set up LND on my desktop computer. Ran command to restore wallet: ./lncli create --multi_file=channel.backup. Enter in 24 word recovery seed. Allow LND to do its thing.

See the following in the log:

2019-12-10 20:17:35.689 [INF] LNWL: RECOVERY MODE ENABLED -- rescanning for used addresses with recovery_window=2500 2019-12-10 20:17:36.340 [INF] LNWL: Seed birthday surpassed, starting recovery of wallet from height=607115 hash=00000000000000000006de43c6f3ef4761652f2ab4b72e99cacf3958dfd82f95 with recovery-window=2500 2019-12-10 20:18:21.234 [INF] LNWL: Scanning 446 blocks for recoverable addresses 2019-12-10 20:19:55.940 [INF] LNWL: Recovered 1 external addrs at height=607437 hash=00000000000000000006fc1ff25bdefa30bf14f06f768d4f88f3323152f69de6 2019-12-10 20:19:55.940 [INF] LNWL: Found 1 spends from watched outpoints at height=607437 hash=00000000000000000006fc1ff25bdefa30bf14f06f768d4f88f3323152f69de6 2019-12-10 20:20:22.169 [INF] LNWL: Recovered addresses from blocks 607115-607560 2019-12-10 20:20:22.481 [INF] LNWL: Started rescan from block 0000000000000000000fa8e058b44e7b61b613eb961bb5a9a0fbd07df5d4a567 (height 607560) for 1 address 2019-12-10 20:20:23.450 [INF] LNWL: Catching up block hashes to height 607561, this might take a while 2019-12-10 20:20:23.453 [INF] LNWL: Done catching up block hashes 2019-12-10 20:20:23.453 [INF] LNWL: Finished rescan for 1 address (synced to block 00000000000000000008fd994b5f8310177b40c2e8cd3b997ad81e35dab0dd4b, height 607561) 2019-12-10 20:20:24.017 [INF] LTND: Chain backend is fully synced (end_height=607561)!

Which I assume is on-chain scanning. However, it doesn't appear that LND is making any attempt to recover channel balance.

Verifying channel backup gives blank response:

./lncli verifychanbackup --multi_file="channel.backup"

{

}

Asking to manually run the restore again produces no output at all and no response from LND. ./lncli restorechanbackup --multi_file=channel.backup

Are funds safu?

wpaulino commented 4 years ago

I recommend trying out v0.8.2-beta-rc2 since there are some SCB fixes that might be helpful.

Do you have any other logs after the ones you pasted above? Channel recovery doesn't happen until after this point.

rbrooklyn commented 4 years ago

Thanks for the advice.

I removed ~/.lnd and started the process again with 0.8.2-beta-rc2.

Everything beyond what I pasted in the original report seems to just be normal LND activity from what I can see.

I have attached the full log here.

lnd.log

Roasbeef commented 4 years ago

There's no indication that any recovery was initiated in those logs. I'd try using the regular lncli restorechanbackup commands.

Roasbeef commented 4 years ago

I'd also check the listchannels and pendingchannels commands, if the channels were re-inserted, then they'll be present there. Also ultimately, recovering the channels actually entails closing them out on chain and sweeping those funds on chain into your wallet.

rbrooklyn commented 4 years ago

Hi Roasbeef,

Unfortunately, restorechanbackup doesn't do anything. I run it, I get no response at all from lncli and lnd itself doesn't respond in the logs.

Both listchannels and pendingchannels show nothing of interest:

./lncli listchannels { "channels": [ ] }

./lncli pendingchannels { "total_limbo_balance": "0", "pending_open_channels": [ ], "pending_closing_channels": [ ], "pending_force_closing_channels": [ ], "waiting_close_channels": [ ] }

I'm not sure if I can increase debug logging somewhere to help with this? Currently LND simply doesn't react at all when I run the restorechanbackups command.

Luckily I am able to email the node owner. If I am able to establish a dialogue with him, how can I make it as easy as possible to say "force close X channel please" as I can't identify the channel at my end. I've already emailed and given the txid, but I think that would involve them trawling through logs, and they might just ignore it.

guggero commented 4 years ago

How many channels should be in the channel.backup? If there are more than a few dozen, the lncli restorechanbackup process can take a while (multiple minutes). I recommend setting the --debuglevel=debug then you should see some progress in the logs. The channels to restore should end up in waiting_close_channels of pendingchannels. Otherwise the restorechanbackup command didn't work properly.

Finally, the node will only start to negotiate channel closes with the peers after the graph is synced. That can take a few hours, look out for the synced_to_graph flag of lncli getinfo.

rbrooklyn commented 4 years ago

Just 1 channel. It was the first and only channel this node had ever opened.

I left lnd running overnight. No movement on my coins, sadly. Even when running the recovery again this morning there is still no reaction from lncli or LND.

A replacement part for the failed node is coming tomorrow. If I use the same software and environment as before I guess I stand a better chance of things working, so I'll try that and restore from my seed words. That way if the channel is forced close from the other side I'll get my funds back.

But I really want to know what is wrong. I have the backup, yet LND seems content on ignoring it.

Roasbeef commented 4 years ago

Try to re-issue the restorechanbackup command. Watch the logs when you do so, ideally on trace, and starting with --noboostrap.

rbrooklyn commented 4 years ago

Hi Roasbeef,

Nothing happens. Literally nothing. I run the restorechanbackup command, lncli returns nothing, lnd does nothing, even with the debug level set to trace and --nobootstrap set on startup.

However, the peer has got back in contact with me. He found the channel and force closed it and it's been confirmed for several hours now. So presumably that's progress. However, when deleting ~/.lnd and creating a new wallet with my seed words, LND still doesn't pick up the funds, despite the transaction being very deeply confirmed now.

Looking at the logs, it identifies me moving the money into the wallet with the right block height, it identifies me moving the money out of the wallet to fund the channel with the right block height, but then it proudly declares that it's fully caught up and no funds are seen. This is with and without giving the channel backup file; same result both times. Very confusing.

Surely if the peer has forced closed I should be able to recover funds now? I'm making the assumption that the peer closed the channel fairly and didn't transmit a punishment transaction (as presumably that would involve a timelock of some description, which doesn't seem to be the case here). Seems odd that LND isn't picking up the funds being returned, unless I'm misunderstanding something about this whole process.

wpaulino commented 4 years ago

However, when deleting ~/.lnd and creating a new wallet with my seed words, LND still doesn't pick up the funds, despite the transaction being very deeply confirmed now.

If the channel was opened prior to v0.8.0-beta, then the funds don't go to a wallet address and they need to be swept.

lncli returns nothing

Returns nothing as if it hangs or it returns you to the CLI prompt?

If you increase the RPC log level on your bitcoind node, do you see any activity?

rbrooklyn commented 4 years ago

If the channel was opened prior to v0.8.0-beta, then the funds don't go to a wallet address and they need to be swept.

This node was running 0.8.1 at the time the channel was created. It's only ever run 0.8.0 or above and I think I didn't even create the wallet until 0.8.1.

Returns nothing as if it hangs or it returns you to the CLI prompt?

CLI prompt. Nothing on LND, nothing on bitcoind. Literally as if I did nothing at all.

guggero commented 4 years ago

How big (in bytes) is your channel.backup file? I suspect it doesn't contain any channels for some reason. That would explain the behavior. Did you make nightly backups of that file? Maybe you find one that has a bigger file size?

rbrooklyn commented 4 years ago

That would make sense. You might be on to something there. File size is 45 bytes. So not empty but very small. I don't know what the normal size of these files should be.

I made the channel using the RTL client. It advised I made a backup, which is the file it gave me. I did make the backup before the channel's funding transaction was confirmed, which is maybe something to do with it?

In any case, now the channel has been force closed by the other party, the SCB file shouldn't factor in (assuming I'm understanding this correctly), so how can I access the funds? I still have my 24 words, which I know is correct as LND perfectly described the transaction into the wallet and out of the wallet to fund the channel. So it's there somewhere I'm assuming.

guggero commented 4 years ago

Ok, we have found the problem then. 45 bytes is only the binary wrapper, that's not enough for a channel. The RPC call that gives you the backups (which RTL uses) doesn't include unconfirmed channels. I'm going to change the title of this issue so we know what the problem is. I'm going to create a fix as soon as I can.

In the mean time, you can follow the instructions of my blog post: https://medium.com/@guggero/did-you-lose-funds-on-the-lightning-network-because-of-a-disk-crash-8971b6a92494

The problem is (as @wpaulino explained) that you need an information (the commit_point) from the other node to be able to create the tweaked key that the funds now reside on after the remote peer force-closed. The seed is unfortunately not enough for this (until we have the "tweakless" channels which are only available if both nodes are on 0.8.0 or later).

You should be able to only do the first part of the described steps that creates the fake backup file and then use that file with your current node that is already synced. Feel free to contact me on Slack if you need help with it.

rbrooklyn commented 4 years ago

Thanks for that link. My google-fu was too weak to find it myself. Having a very quick skim read of the article it looks like I have a chance of recovering my funds, which is good to hear.

I'll have a go at this after work and reach out to you if I get stuck, that's very kind. Thanks.

I'm glad this escapade has highlighted a bug. It's always nice to contribute even if I lack the technical skills to contribute directly with code.

bitbug42 commented 4 years ago

Hi, I am the peer that node was connected to. Glad to see the root problem has been identified. I'm running c-lightning v0.7.3, by any chance do you know how I could extract the commit_point from it to help recovery? I'll try to see if I can find some info from the c-lightning folks if any.

guggero commented 4 years ago

@bitbug42 thank you for the information, that is helpful to know. Because AFAIK c-lightning deletes the channel related information (like the commit_point) after a certain time (I think one month). I'm not sure how you can extract it, but maybe Sjors can help, I'm pretty sure he's done it before. Or maybe it's even in the c-lightning log from when your node force-closed the channel. Might be worth a look. It's a normal public key (33 byte, hex encoded starts with 02 or 03) called per_commit_point or just Commit Point. It's not really sensitive as you still need an additional private key to sweep the funds, so chances are it's logged.

bitbug42 commented 4 years ago

Got it, thanks for the info. I searched through my logs but for some reason couldn't find it. However, I've been able to locate the channel row in the SQLite db, and found the column named "per_commit_remote" which seems to fit the description:

02768a4b76e51ac7ee7a0107ff05463a3f811c372dce33223cf962a96ccb171427

guggero commented 4 years ago

Thanks, but that's probably not it. Based on the naming, that's more likely the per_commit_point of @rbrooklyn's node that it sent when this last state was negotiated (before the crash). There's this symmetry in the commitment transactions that makes everything a bit harder to understand :sweat_smile: If it's not logged then in the database it's more likely to be called revocation secret or revocation base that can be derived into the specific commit point (see https://github.com/lightningnetwork/lightning-rfc/blob/master/03-transactions.md#per-commitment-secret-requirements). So it's probably easier for @rbrooklyn's node to just query your node.

bitbug42 commented 4 years ago

All right, seems like every day I'm learning a little bit more how Lightning works :)

I do have a revocation_basepoint_remote, but given the name that might not be for the correct side again: 026ea709de98a83c6de16e7856635b0bae3e0a5dbc490e7969800b40a0358491fd

rbrooklyn commented 4 years ago

Hi everyone.

You've all been very kind. I thank you for this. And thank you bitbug42 for finding this raised issue (if you're reading this issue as an outsider, he found it by himself, I didn't invite him to help, which is super nice of him) and trying to assist me.

I'm annoyed at myself that I'm distracting away from development time on other things, even if it was nice to find a thing that needs fixing in the software.

Anyway, some progress on this...

The instructions were very clear on the medium post. Thanks. I have the modified LND working no problems, and thankfully as the wallet and channel are both so new, the rescan is actually very fast, so I can keep trying and trying on this without long delays.

However, lucky as I am with that, I wasn't as lucky with the timing of the hardware failure, the channel never appeared on 1ml.com as my node died so quickly after initiating the channel creation, so I can't look up any information (I assume that's the cause, or I opened a private channel, but I don't think I did that).

@bitbug42 can I pester you yet again for more information on the channel please?

I need the "short channel id", which will be in this format: blockheight:transactionindex:outputindex (or it might have a x instead of a : in place) So for the channel I opened with you, it would be something like this: 607492:2000:0 The bits in bold are actually the correct numbers (I think), so it's just the bit in the middle (the 2000 I made up) that I'm missing.

I think once I have that I think we're in business.

I admit this is actually kinda fun. Shame I've got some of my actual money on the line here! #reckless

bitbug42 commented 4 years ago

@rbrooklyn Sure, no problem. Your short channel ID is 607492x2504x0

I'll be on standby until I get confirmation you've got your funds back ;)

rbrooklyn commented 4 years ago

Not much progress unfortunately.

It looks like it tries to do something, but doesn't get anywhere.

I think these are the relevant parts of the log:

2019-12-13 20:10:54.590 [INF] LTND: Inserting 1 SCB channel shells into DB 2019-12-13 20:10:54.594 [INF] LTND: Informing chain watchers of new restored channels 2019-12-13 20:10:54.732 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) 2019-12-13 20:11:13.243 [WRN] CNCT: Unprompted commitment broadcast for ChannelPoint(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) 2019-12-13 20:11:13.243 [WRN] CNCT: Remote node broadcast state #126240759707278, which is more than 1 beyond best known state #0!!! Attempting recovery... 2019-12-13 20:11:13.243 [ERR] CNCT: Unable to retrieve commitment point for channel(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) with lost state: no commit point found. Retrying in 1s. 2019-12-13 20:11:14.243 [ERR] CNCT: Unable to retrieve commitment point for channel(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) with lost state: no commit point found. Retrying in 2s. 2019-12-13 20:11:16.244 [ERR] CNCT: Unable to retrieve commitment point for channel(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) with lost state: no commit point found. Retrying in 4s. 2019-12-13 20:11:20.244 [ERR] CNCT: Unable to retrieve commitment point for channel(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) with lost state: no commit point found. Retrying in 8s. 2019-12-13 20:11:28.244 [ERR] CNCT: Unable to retrieve commitment point for channel(cb3d0bbf7e3b1ababe7194cc9408e2f647a496fc8e6267ad10d9dba633e4a03c:0) with lost state: no commit point found. Retrying in 16s.

Here is the full log: lnd_recovery.log

rbrooklyn commented 4 years ago

@bitbug42 I might have found the commit_point. It's 33 bytes, it's hex coded. I might just be on to something!

I'm using sqllitebrowser (that link goes to their site, but I just grabbed it from my distro's package manager) to look at a .sqlite file.

Then it's fairly easy:

Open sqllitebrowser Open a copy of your clightning sqllite file Click "Execute SQL" tab. Copy and paste this in:

SELECT short_channel_id,commitment_point FROM outputs LEFT JOIN channels ON channels.id = outputs.channel_id WHERE channels.short_channel_id = '607492x2504x0'

If that gives no results, try this instead and be sure to find the correct row for our channel:

SELECT short_channel_id,commitment_point FROM outputs LEFT JOIN channels ON channels.id = outputs.channel_id

You should hopefully have a result where the "commitment_point" shows as "BLOB" and our short_channel_id is the expected number. Click the BLOB and you should see "Edit database cell" on the right hand side have this information in. Select all the text, right click, "Copy HEX and ASCII", paste into here.

You should get something like this in your clipboard to paste: 0000 02 ea 99 18 c6 66 18 bd 11 20 de f8 a1 64 e0 5d .....f... ...d.] 0010 c1 96 fa 3a a3 9a 9e 7d 01 3c 54 d5 39 a8 c0 47 ...:...}.<T.9..G 0020 af

So we strip out the 0000, 0010, 0020 and the crap on the right and I think we have our 33 bytes hex coded commit_point!

If that doesn't work, could you make a backup of your sqllite file? Just so that we don't risk clightning deleting the information we need.

bitbug42 commented 4 years ago

I have a backup of my c-ln database file from the other day somewhere safe, don't worry.

I ran the queries but unfortunately I couldn't find any corresponding entry. If I remove the WHERE clause, the last non-NULL entry goes back to block 606439, so not our channel. So perhaps that's why the modified LND fails to find any info too.

I'm starting to think perhaps it's due to the fact that since the channel was never in an "operational" state my node never even went to the stage of saving a commitment point?

If that's the case, we might need to find a way to recompute that point if it is deterministic.

rbrooklyn commented 4 years ago

I have a feeling it's game over. I think I'm going to automate my backups of the SCB file this afternoon :)

If it was deterministic then querying your node would never be part of the process. We'd just work it out at my end.

What puzzles me is why this process is needed at all and why force closed channels don't just push the money back into an address derived from the wallet seed, or at least a derivation of the seed. Feels to me like it just introduces more risk of things going wrong. Maybe it's a privacy thing or some other really good reason I don't understand.

We might still be looking in the wrong place but that location feels very right, looking around the sqlite file.

guggero commented 4 years ago

@rbrooklyn don't give up, we'll get this!

Unfortunately c-lightning doesn't keep the required data around very long if it thinks its job is done (since the channel is force-closed and swept on c-lightning side, it does think so).

The commit point can be recomputed deterministic, but only on the node that did the force-close. I'm going to investigate how this can be done with c-lightning and then ask @bitbug42 to perform the necessary steps.

What puzzles me is why this process is needed at all and why force closed channels don't just push the money back into an address derived from the wallet seed, or at least a derivation of the seed.

The commitment transactions were initially designed that way to achieve maximum privacy. This has partly been fixed by "safu commitments" (static to_remote commitment key) which does not tweak the key of the remote node output any more. As you were running 0.8.1 this option should have been activated if c-lightning also supports this in the version @bitbug42 is running. We could give this a try.

guggero commented 4 years ago

It looks like c-lightning added a tool to do exactly what we need: https://github.com/ElementsProject/lightning/pull/3186. Also see https://bitcoin.stackexchange.com/questions/90196/how-to-recover-funds-from-a-force-closed-lnd-lightning-channel/90719

@bitbug42 could you please try to run the following command on your node:

hsmtool dumpcommitments <node id> <channel dbid> <depth> <path/to/hsm_secret>

Where:

Thanks!

bitbug42 commented 4 years ago

Nice, looks like it's exactly what we need.

Here's the actions I did:

bitbug42 commented 4 years ago

For reference, the complete command:

hsmtool dumpcommitments 03674650a12fee1530bbb97455dcd36813685af5b078f57c76b9347816843d68e7 54490 1 ~/.lightning/hsm_secret

guggero commented 4 years ago

Awesome, thank you very much! I hope that's it :crossed_fingers:

rbrooklyn commented 4 years ago

Sadly, no joy on this one:

Testing key 1 of 5000 Testing key 1001 of 5000 Testing key 2001 of 5000 Testing key 3001 of 5000 Testing key 4001 of 5000 [lnwallet] did not find matching private key after 5000 tries

To confirm, the command I'm using is:

./lnwallet-debug bruteforce --root_key "HD node root key base58 taken from the seed words using the cryptography toolkit" --per_commit_point 02ddf74f8b45c66b7f702daa9b79e84a0d47e8ed9913650f4adfe604a952859dc4 --payout_address "address the funds went to when bitbug's node force closed the transaction"

bitbug42 commented 4 years ago

Hmm... I think we're close.

I copied all the row from my db concerning this channel, so that you can have all the info. It has all the integer columns and lacking the BLOBs, so I don't think there is any sensitive data: cln_channel_row.txt (Github refused the CSV extension for some reason, just open that file as a CSV file)

I also found something interesting minimum_depth = 3, perhaps the first commit derivation really uses that as the first index?

So here are the commit points for all the first few indices. I hope we have the winner among those:

commit point #0: 02ddf74f8b45c66b7f702daa9b79e84a0d47e8ed9913650f4adfe604a952859dc4
commit point #1: 02e9c719c9f60226863135861aaf894650f525e39541e845d75eccef071ca2701d
commit point #2: 02986114d80bb3a155a071cd8ce1586510d223ae39b36aec16d99d9ac74d0ea506
commit point #3: 03f7182873ff4b213580e3c773d9927d2378fad4d9cb8bc9b569d81ce617e5f8d5
commit point #4: 02d18ef5320a47d6dbf729956d193a5354e5b13a2a926330522790af6636c543ff
commit point #5: 02a7ca255042aca55f2772660ac38ac71e82bd190029d3c2db65c98472bcddc122
rbrooklyn commented 4 years ago

Same result unfortunately. Tried all five.

I can't see anything in that csv that would help us. But this is all a bit beyond me anyway now :)

guggero commented 4 years ago

I think this only leaves one explanation. Because you are running 0.8.1 it's possible that your node opened the channel in the new "tweakless" format where the key is not manipulated in any matter and should be easily derivable from the seed (I sent instructions on how to do so in Slack). Both peers need to support this and I wasn't aware that c-lightning already implemented this. That's why I didn't consider this in the beginning. But I guess it's worth a try.

rbrooklyn commented 4 years ago

At the risk of repeating myself... No joy with the instructions in the private message on slack.

Thing is, there are so many steps using tools I'm not used to using, that maybe I'm doing something daft and I wouldn't know it. In fact, it's fairly probable I've screwed up the process somewhere along the line and the funds are completely recoverable.

I still don't really "get" how this odd method of using return addresses outside the seed helps. If we had used addresses generated by the seed only, then force closing the channel would've fixed everything without too much effort. I understand it's supposed to be a privacy thing but given I never share my public/private master key with the peer, it seems a little overkill and leads to situations like this. I guess I have to defer to the experts on this one.

Whilst it would be nice to get the funds back, I have accepted they are gone and it was the cost of finding this issue. As bug bounties go, this was pretty cheap :)

rbrooklyn commented 4 years ago

Hello!

Looks like this is probably getting fixed in 0.10, which is good to see.

Do you think I'm out of luck with the fund recovery?

guggero commented 4 years ago

Yes, it looks like the PR will make it into 0.10. I'm pretty sure you have everything you need to recover the funds. Maybe you did make a mistake with one of the commands. Feel free to contact me on Slack again if you want to give it another try and we'll debug the individual steps.