lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.63k stars 2.08k forks source link

Force closed channel stuck in pending channels forever #1571

Closed frico21 closed 6 years ago

frico21 commented 6 years ago

Background

I have a force closed channel on pending channels for more than 1 month by now. This is what it looks like when calling for pendingchannels, take a look at blocks_til_maturity: -1175.

The channel was with another node I own and that node pendingchannels doesn't show anything, also the closing tx was confirmed normally....

Can't say if the funding channel amount was returned since I was not aware of this issue after a long time, but listchaintxns shows no tx with that amount nor block height.

Other force closed channels actually don't get stuck on pendingchannels after reaching the maturity block but not this one.

"pending_force_closing_channels": [
        {
            "channel": {
                "remote_node_pub": "0292e8060da527f77be1db34a2a9f0b0baf158094e7424970806d5c9f90f4db312",
                "channel_point": "253e3d087561b14fb36faba68e674a71de2d68114f9dfe463a71e0b7b0c7f555:0",
                "capacity": "70143",
                "local_balance": "68876",
                "remote_balance": "0"
            },
            "closing_txid": "89efef5cc0e2b2e7d9f430ca970bef66ea031b359f986e7e14d71c0d2b1ebb75",
            "limbo_balance": "68876",
            "maturity_height": 530840,
            "blocks_til_maturity": -1175,
            "recovered_balance": "0",
            "pending_htlcs": [
            ]
        }
    ],

Your environment

lnd_log-2.txt

frico21 commented 6 years ago

any progress on this? :S

slush0 commented 6 years ago

I'm affected by this, too. Have one pending channel with "-1371" blocks_til_maturity.

However, I remember this happen in times of misconfiguration of my node, when lnd probably got skipped some block updates. So the cause is most likely fixed already, but the channel is still pending and funds locked in limbo. Closing tx has already over 1000 confirmations, but my lncli listchaintxns doesn't see such tx.

Roasbeef commented 6 years ago
 4 2018-07-16 19:16:41.744 [INF] NTFN: Dispatching 1 conf notification for 89efef5cc0e2b2e7d9f430ca970bef66ea031b359f986e7e14d71c0d2b1ebb75                                                                                       3 2018-07-16 19:16:41.744 [INF] NTFN: New block epoch subscription
    2 2018-07-16 19:16:41.744 [INF] NTFN: New spend subscription: utxo=89efef5cc0e2b2e7d9f430ca970bef66ea031b359f986e7e14d71c0d2b1ebb75:0                                                                                            1 2018-07-16 19:16:41.745 [WRN] DISC: unable to fetch channel info for chanID=757ea3b75eb22f7d3a5a1a7a3594289ee4e81d19c879c9a844c4ace1378d269c from graph: edge not found. Will delete localproof from database
150   2018-07-16 19:16:41.745 [INF] CNCT: *contractcourt.commitSweepResolver(253e3d087561b14fb36faba68e674a71de2d68114f9dfe463a71e0b7b0c7f555:0): waiting for commit output to be swept

@frico21 in your case it appears that the contract resolver thinks that this is actually a CSV output, when it's due to the remote party closing out on chain, meaning that we don't actually have to wait for a CSV delay before we can sweep the output. Are you using the bitcoind backend?

Roasbeef commented 6 years ago

@slush0 in your case, is the size of the limbo output very small? Like some low multiple of dust?

slush0 commented 6 years ago

The limbo balance is 0.1677 BTC:

        {
            "channel": {
                "remote_node_pub": "02198630153bc64ca60847abefac0fb08cbe2d4a59a00c8a5e62460db039437e08",
                "channel_point": "09cc4c277d935812669d94599d9c0f9bfc1d223f63edd8c3b09ff06d6e8deeb1:0",
                "capacity": "16777215",
                "local_balance": "16776310",
                "remote_balance": "0"
            },
            "closing_txid": "c68a08a9980bdbdcfa23283c3a8eadc2bba45e023d7a0f03c4efc9cf01fc7183",
            "limbo_balance": "16776310",
            "maturity_height": 531400,
            "blocks_til_maturity": -1765,
            "recovered_balance": "0",
            "pending_htlcs": [
            ]
        }
frico21 commented 6 years ago

@Roasbeef I am using btcd (version of btcd: version 0.12.0-beta)

slush0 commented 6 years ago

This may be relevant, however none of these numbers seems familiar to me. I just found out in logs:

2018-07-23 13:42:53.470 [WRN] FNDG: reservation timed out waiting for peer (peerID:&{0x180e3e0 83822701963169525159652230423423660633082254034234880534627458694523026834337 60876187357381778608527969098090226613195981471411584135924709582384954915630}, chanID:5eb333e83ee4c42b9564b27862b17f201ebda7ce7c211f3907caa89788ac507b)
2018-07-23 13:42:53.471 [ERR] FNDG: unable to cancel reservation: unable to cancel reservation: attempted to cancel non-existent funding state
Roasbeef commented 6 years ago

That last log is that a peer didn't finish the funding workflow in time, so we remove the funding reservation in order to free up those inputs. We'll only do this before the funding transaction itself has actually been broadcast/signed.

Roasbeef commented 6 years ago

Related to #1671. @frico21 can you try to lower the min relay fee (and min fee) setting on your node? Then restart lnd after bitcoind parses the new parameters.

slush0 commented 6 years ago

@Roasbeef Any workaround how to unstuck my 0.16 BTC? Closing tx has already thousands of confirmation, but I don't know how fix it.

Roasbeef commented 6 years ago

@slush0 so from what we've gathered, there's an issue related to the usage of vsize and transactions that use 1 sat/byte sometimes being under the min relay fee for bitcoind. It's related to some rounding artifacts when converting to/from vsize within bitcoind. We're moving to sat/kw everywhere internally in order to resolve this via #1644.

Roasbeef commented 6 years ago

If you can get the transaction into your mempool (by lowering the minfee and minrelayfee settings for your node), then once it has enough priority (or even before then), you should be able to propagate it a bit more via various push tx API's.

frico21 commented 6 years ago

@Roasbeef I use btcd. I have set its min fees to 0, and restarted everything... nothing changed.

slush0 commented 6 years ago

@Roasbeef I believe my issue is different, because closing tx of such closing channel is already mined in the blockchain: https://btc-explorer.trezor.io/tx/c68a08a9980bdbdcfa23283c3a8eadc2bba45e023d7a0f03c4efc9cf01fc7183

halseth commented 6 years ago

Doesn't look like this is a fee problem, as the closing transactions are already confirmed. Probably the UTXO Nursery not sweeping the timelocked output?

halseth commented 6 years ago

@cfromknecht We could create a routine to resend pending channels to the nursery at startup?

Not sure why they weren't swept originally though, but maybe that could be related to incorrect fee estimation? How do we handle the case where we try to publish a sweep, but fails?

vegardengen commented 6 years ago

I think I have a similar issue, but have been patiently waiting for these issues to be resolved (pure laziness).

I can see the force-close TXes has been mined long ago, but I never saw it seeing the confirmation.

Note: This is the same channels that I had to wait for chain watcher startup in #1616 - that might help with figuring out what happened in my case.

Will be able to provide logs from the whole process later.

halseth commented 6 years ago

@vegardengen Awesome, logs would be really helpful for tracking this down! 😀

vegardengen commented 6 years ago

So, here is what happens in my case (all times UTC+2).

018-07-24 22:11:02.754 [INF] CNCT: Broadcasting force close transaction, ChannelPoint(afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0): (wire.MsgTx)(0x13be2980)({ Version: (int32) 2, TxIn: ([]wire.TxIn) (len=1 cap=1) { (wire.TxIn)(0x12e47c00)({ PreviousOutPoint: (wire.OutPoint) afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0, SignatureScript: ([]uint8) { }, Witness: (wire.TxWitness) (len=4 cap=4) { ([]uint8) , ([]uint8) (len=72 cap=144) { 00000000 30 45 02 21 00 d0 3f 15 ff 81 80 af 75 be 3f e3 |0E.!..?.....u.?.| 00000010 8e 45 fb fd 9f 1e f8 3a 10 45 68 5a 62 58 13 fb |.E.....:.EhZbX..| 00000020 0d b3 61 31 c3 02 20 5d 0b a0 a9 fc 1b 14 99 21 |..a1.. ].......!| 00000030 18 d5 21 79 74 59 ef b1 fa d9 6f 0d 22 e3 1e df |..!ytY....o."...| 00000040 00 60 f6 a7 82 34 b8 01 |.`...4..| }, ([]uint8) (len=71 cap=144) { 00000000 30 44 02 20 67 bd 3c b8 62 1c 23 f1 de 4e 34 b0 |0D. g.<.b.#..N4.| 00000010 b5 f6 9c c2 1e db 7f 04 50 a9 19 5d b4 7a fa 5a |........P..].z.Z| 00000020 51 2a 3b 2e 02 20 25 4e 21 47 a4 87 c3 b7 b8 1b |Q;.. %N!G......| 00000030 60 76 ff 2b 73 ea 4b 71 8d 8f 54 79 08 99 71 da |`v.+s.Kq..Ty..q.| 00000040 10 8e c3 7a 83 31 01 |...z.1.| }, ([]uint8) (len=71 cap=500) { 00000000 52 21 02 87 a1 07 e7 66 f3 5f 7f 2a 56 2e 3c 85 |R!.....f._.V.<.| 00000010 5b 85 24 2c 7b 15 23 e0 df e5 7e e8 94 35 61 d4 |[.$,{.#...~..5a.| 00000020 c8 47 e3 21 02 d7 a5 3e 6e e0 1a 6c f3 27 7b 58 |.G.!...>n..l.'{X| 00000030 95 a7 d5 7d dd dd 7f f5 89 8b 08 95 87 2a e4 a2 |...}...........| 00000040 55 6a 81 ff f3 52 ae |Uj...R.| } }, Sequence: (uint32) 2161320420 }) }, TxOut: ([]wire.TxOut) (len=2 cap=2) { (wire.TxOut)(0x153b8990)({ Value: (int64) 1453, PkScript: ([]uint8) (len=34 cap=34) { 00000000 00 20 d1 dc 91 04 72 3a da cc 99 06 c3 17 70 01 |. ....r:......p.| 00000010 b1 ce 82 fb 99 d5 02 e2 45 44 f6 eb 05 ec 65 90 |........ED....e.| 00000020 67 81 |g.| } }), (*wire.TxOut)(0x153b89a4)({ Value: (int64) 97823, PkScript: ([]uint8) (len=22 cap=22) { 00000000 00 14 da 80 df 85 57 c8 bc ce ac 81 2e 99 86 4e |......W........N| 00000010 fc 69 b5 72 ae 99 |.i.r..| } }) }, LockTime: (uint32) 548398034 })

2018-07-24 22:11:02.754 [INF] LNWL: Inserting unconfirmed transaction b5364daccec787316223698c10f5af2d1ea9f0b778e51f862ab8247d31bf4b7f 2018-07-24 22:11:02.790 [INF] CNCT: ChannelArbitrator(afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0): noop state userTrigger 2018-07-24 22:11:02.803 [INF] PEER: Waiting for confirmation of cooperative close of ChannelPoint(afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0) with txid: b5364daccec787316223698c10f5af2d1ea9f0b778e51f862ab8247d31bf4b7f

(this transaction is confirmed later, in block 533490)

Now, I also force-close a few more. Simplifying logs because they are pretty normal, but txids and channel points is interesting to keep in mind.

2018-07-24 22:13:37.583 [INF] PEER: Waiting for confirmation of cooperative close of ChannelPoint(00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0) with txid: c5f7b2af76bba15c74f27354b800e28c8657f9a22078caa1f6400bcf7368fcc1

2018-07-24 22:15:17.665 [INF] PEER: Waiting for confirmation of cooperative close of ChannelPoint(002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1) with txid: d27fcca48b2c8916e5f388bb43d5cf315678f4d877ebf9c91258d48066e038b9

2018-07-24 22:17:56.881 [INF] PEER: Waiting for confirmation of cooperative close of ChannelPoint(ad01a921838dafbbf674efa6952a2c6db6ce128d31d36b5734a7097a5f372c28:1) with txid: 68dbcca59b9bac4f90fba916184f723beaadfae89c47a115ada490e25fd27e19

At 22:34 i restart, for some reason.

It gets stuck here during startup: 2018-07-24 22:35:26.866 [INF] NTFN: New spend subscription: utxo=afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0 2018-07-24 22:35:26.866 [INF] CNCT: Close observer for ChannelPoint(4ac387ba45939a4cb87110f54c1ed20ad99af0076908d9e06e9d6518b817c6ff:1) active 2018-07-24 22:35:41.499 [INF] RPCS: [listchannels] fetched 69 channels from DB 2018-07-24 22:37:20.071 [INF] RPCS: [listchannels] fetched 69 channels from DB 2018-07-24 22:40:02.627 [INF] RPCS: [listchannels] fetched 69 channels from DB

Here it has been stuck for 5 minutes, so I get impatient and tries another restart.... 2018-07-24 22:44:31.186 [INF] LTND: Received shutdown request. 2018-07-24 22:44:31.186 [INF] LTND: Shutting down... 2018-07-24 22:44:31.186 [INF] LTND: Gracefully shutting down.

I think I had to kill it, and I restart:

2018-07-24 22:44:55.825 [INF] LTND: Version 0.4.2-beta commit=76da55f500b06a2ecd9c151ce9fdc031870afc99

And.... 2018-07-24 22:45:26.893 [INF] NTFN: New spend subscription: utxo=002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1 2018-07-24 22:45:26.893 [INF] CNCT: Close observer for ChannelPoint(2c8ef85ca9f2676cda5745faddbfb1d6c968ec0a05d0b4c1038f224ee813d6ec:1) active 2018-07-24 22:48:50.307 [INF] NTFN: Dispatching confirmed spend notification for outpoint=002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1 at height 533488 018-07-24 22:48:51.139 [INF] NTFN: New block: height=533488, sha=0000000000000000000e2939698e4e8d1bbf550a3f4e262fc59f53fb93a8cfe2 2018-07-24 22:48:51.140 [INF] UTXN: Attempting to graduate height=533488: num_kids=0, num_babies=0 2018-07-24 22:50:02.831 [INF] RPCS: [listchannels] fetched 69 channels from DB 2018-07-24 22:50:41.045 [INF] NTFN: New block: height=533489, sha=0000000000000000001d7ef77d5b349c21bc15aa05fa1c2ac489828febd35222 2018-07-24 22:50:41.046 [INF] UTXN: Attempting to graduate height=533489: num_kids=0, num_babies=0 2018-07-24 22:50:46.634 [INF] NTFN: New block: height=533490, sha=000000000000000000003d257474577c0f8ce5f6a1a6ead847cf7c6a98ef0cea 2018-07-24 22:50:46.635 [INF] UTXN: Attempting to graduate height=533490: num_kids=0, num_babies=0

Here it hangs, so another attempt at shutdown, probably kill, and restart - getting a bit impatient, just want to get it to start and go to bed....

2018-07-24 22:53:58.662 [INF] LTND: Received shutdown request. 2018-07-24 22:53:58.662 [INF] LTND: Shutting down... 2018-07-24 22:53:58.662 [INF] LTND: Gracefully shutting down. 2018-07-24 22:54:11.125 [INF] LTND: Version 0.4.2-beta commit=76da55f500b06a2ecd9c151ce9fdc031870afc99 2018-07-24 22:54:11.125 [INF] LTND: Active chain: Bitcoin (network=mainnet) 2018-07-24 22:54:11.126 [INF] CHDB: Checking for schema update: latest_version=3, db_version=3 2018-07-24 22:54:11.129 [INF] RPCS: password RPC server listening on 127.0.0.1:10009 2018-07-24 22:54:11.130 [INF] RPCS: password RPC server listening on [2001:4661:4fe5:0:1148:5e63:5693:cfd]:10009 2018-07-24 22:54:11.130 [INF] RPCS: password gRPC proxy started at [2001:4661:4fe5:0:1148:5e63:5693:cfd]:8080 2018-07-24 22:54:11.130 [INF] RPCS: password RPC server listening on [::1]:10009 2018-07-24 22:54:11.130 [INF] LTND: Waiting for wallet encryption password. Use lncli create to create a wallet, lncli unlock to unlock an existing wallet, or lncli changepassword to change the password of an existing wallet and unlock it. 2018-07-24 22:54:21.874 [INF] LNWL: Opened wallet 2018-07-24 22:54:22.099 [INF] LTND: Primary chain is set to: bitcoin 2018-07-24 22:54:22.099 [INF] LTND: Initializing bitcoind backed fee estimator 2018-07-24 22:54:22.106 [INF] LNWL: Started listening for blocks via ZMQ on tcp://127.0.0.1:29000 2018-07-24 22:54:25.188 [INF] LNWL: The wallet has been unlocked without a time limit 2018-07-24 22:54:25.189 [INF] LTND: LightningWallet opened 2018-07-24 22:54:25.204 [INF] HSWC: Restoring in-memory circuit state from disk 2018-07-24 22:54:26.503 [INF] LNWL: Started rescan from block 000000000000000000003d257474577c0f8ce5f6a1a6ead847cf7c6a98ef0cea (height 533490) for 1472 addresses 2018-07-24 22:54:26.680 [INF] LNWL: Starting rescan from block 000000000000000000003d257474577c0f8ce5f6a1a6ead847cf7c6a98ef0cea 2018-07-24 22:54:26.689 [INF] LNWL: Rescan finished at 533490 (000000000000000000003d257474577c0f8ce5f6a1a6ead847cf7c6a98ef0cea) 2018-07-24 22:54:26.689 [INF] LNWL: Catching up block hashes to height 533490, this might take a while 2018-07-24 22:54:26.694 [INF] LNWL: Done catching up block hashes 2018-07-24 22:54:26.694 [INF] LNWL: Finished rescan for 1472 addresses (synced to block 000000000000000000003d257474577c0f8ce5f6a1a6ead847cf7c6a98ef0cea, height 533490) 2018-07-24 22:54:26.704 [INF] LNWL: Removed conflicting tx: (wire.MsgTx)(0x11a167e0)({ Version: (int32) 2, TxIn: ([]wire.TxIn) (len=1 cap=1) { (wire.TxIn)(0x11558ac0)({ PreviousOutPoint: (wire.OutPoint) 00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0, SignatureScript: ([]uint8) { }, Witness: (wire.TxWitness) (len=4 cap=4) { ([]uint8) { }, ([]uint8) (len=72 cap=72) { 00000000 30 45 02 21 00 8a 92 da a3 b3 bf 34 3c f0 3e 84 |0E.!.......4<.>.| 00000010 4d 64 5f 56 58 a8 e5 24 2c 2d 27 ad 48 60 f7 d5 |MdVX..$,-'.H`..| 00000020 f7 d7 dd aa a9 02 20 77 1c f9 0f e5 0c d3 d0 11 |...... w........| 00000030 50 fc 50 45 e3 13 81 c5 44 3e c8 74 29 32 e6 69 |P.PE....D>.t)2.i| 00000040 ce ef 7b bc f8 9c 64 01 |..{...d.| }, ([]uint8) (len=71 cap=71) { 00000000 30 44 02 20 45 9d 8f f4 6f 0a 5c e3 f5 bb 35 a7 |0D. E...o....5.| 00000010 5e f9 58 52 ed 81 d0 66 b8 bc 63 4b 5e e6 58 a8 |^.XR...f..cK^.X.| 00000020 4d 1e d8 b4 02 20 08 e3 66 df ca 8b 78 56 2b ac |M.... ..f...xV+.| 00000030 8a 25 ca 3e e5 6f 3d c4 b2 d6 24 90 de 09 df 6b |.%.>.o=...$....k| 00000040 e3 6e 2c 5c cd fd 01 |.n,...| }, ([]uint8) (len=71 cap=71) { 00000000 52 21 02 79 67 20 e2 51 34 4d 5a 54 d0 21 a7 b8 |R!.yg .Q4MZT.!..| 00000010 e0 52 8d 93 7c d2 79 c1 fd b2 c4 03 9f a9 fc aa |.R..|.y.........| 00000020 7a 5f e3 21 02 c6 d9 fa 78 bf db c5 3d 87 6c 45 |z.!....x...=.lE| 00000030 ef cc f9 dd 52 65 eb 14 3e 56 62 1f 96 70 69 6a |....Re..>Vb..pij| 00000040 9e 74 b9 c3 31 52 ae |.t..1R.| } }, Sequence: (uint32) 2151583259 }) }, TxOut: ([]wire.TxOut) (len=2 cap=2) { (wire.TxOut)(0x113b44b0)({ Value: (int64) 78204, PkScript: ([]uint8) (len=22 cap=22) { 00000000 00 14 d5 39 b7 27 58 08 9d d7 24 a1 ea 27 dd 8a |...9.'X...$..'..| 00000010 ed 03 52 62 aa 83 |..Rb..| } }), (wire.TxOut)(0x113b44c4)({ Value: (int64) 919985, PkScript: ([]uint8) (len=34 cap=34) { 00000000 00 20 fb 6b 1d 51 27 e1 89 2d 05 13 5e f6 71 ab |. .k.Q'..-..^.q.| 00000010 d1 93 f2 d9 1b 64 f2 8a 02 f8 99 7b e7 22 46 0e |.....d.....{."F.| 00000020 62 90 |b.| } }) }, LockTime: (uint32) 540296574 })

2018-07-24 22:54:26.723 [INF] LNWL: Removed conflicting tx: (wire.MsgTx)(0x11a166c0)({ Version: (int32) 2, TxIn: ([]wire.TxIn) (len=1 cap=1) { (wire.TxIn)(0x11558a00)({ PreviousOutPoint: (wire.OutPoint) f8786cfba4eebb2e1f7e041e6e5260e15b8c1b1a921b45651e9163bc14fecbc8:0, SignatureScript: ([]uint8) { }, Witness: (wire.TxWitness) (len=4 cap=4) { ([]uint8) { }, ([]uint8) (len=72 cap=72) { 00000000 30 45 02 21 00 f4 7b 2f f8 03 4f 3c 91 c2 fb ce |0E.!..{/..O<....| 00000010 e1 e9 4c 6c 23 92 44 62 2a d6 f2 68 6c aa 80 9a |..Ll#.Db..hl...| 00000020 6b f1 39 13 f3 02 20 21 81 ae 46 db 6d 48 e3 86 |k.9... !..F.mH..| 00000030 5d f2 dc 66 34 a8 84 cc 3f 48 75 c3 b8 ac 7b 7a |]..f4...?Hu...{z| 00000040 06 2f 8e 6e 3e 8a c5 01 |./.n>...| }, ([]uint8) (len=71 cap=71) { 00000000 30 44 02 20 78 54 b2 b0 f7 ac ef 2f 65 a3 e2 e7 |0D. xT...../e...| 00000010 cb 5c d8 d7 f1 c9 05 7e 50 3c 69 12 82 6d 0a 11 |......~P<i..m..| 00000020 ff de 39 15 02 20 26 be 86 0a 65 c0 57 24 92 41 |..9.. &...e.W$.A| 00000030 25 1b 8c 94 f8 84 9d 54 f2 6a dd 81 fb 2b 25 d7 |%......T.j...+%.| 00000040 eb 23 4b a9 8d c2 01 |.#K....| }, ([]uint8) (len=71 cap=71) { 00000000 52 21 03 11 ff c6 ce 91 a7 77 d6 fc 6e d6 e7 d4 |R!.......w..n...| 00000010 61 d5 a6 8e 88 d6 14 41 16 21 34 9e 01 05 b4 37 |a......A.!4....7| 00000020 f0 62 d2 21 03 93 dc 55 e8 77 6b 20 32 99 d7 84 |.b.!...U.wk 2...| 00000030 7f 68 c8 ab 42 fb 2c 3d 4f 67 de 31 70 39 50 e8 |.h..B.,=Og.1p9P.| 00000040 41 67 c2 13 2b 52 ae |Ag..+R.| } }, Sequence: (uint32) 2153945206 }) }, TxOut: ([]wire.TxOut) (len=1 cap=1) { (wire.TxOut)(0x118bc320)({ Value: (int64) 9099, PkScript: ([]uint8) (len=22 cap=22) { 00000000 00 14 bb 8b f6 ee cc 6e e4 1a af 7e 3a eb cd ed |.......n...~:...| 00000010 27 d2 de 00 0d 94 |'.....| } }) }, LockTime: (uint32) 541906212 })

2018-07-24 22:54:26.732 [INF] LNWL: Removed conflicting tx: (wire.MsgTx)(0x11a16720)({ Version: (int32) 2, TxIn: ([]wire.TxIn) (len=1 cap=1) { (wire.TxIn)(0x11558a40)({ PreviousOutPoint: (wire.OutPoint) afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0, SignatureScript: ([]uint8) { }, Witness: (wire.TxWitness) (len=4 cap=4) { ([]uint8) { }, ([]uint8) (len=72 cap=72) { 00000000 30 45 02 21 00 d0 3f 15 ff 81 80 af 75 be 3f e3 |0E.!..?.....u.?.| 00000010 8e 45 fb fd 9f 1e f8 3a 10 45 68 5a 62 58 13 fb |.E.....:.EhZbX..| 00000020 0d b3 61 31 c3 02 20 5d 0b a0 a9 fc 1b 14 99 21 |..a1.. ].......!| 00000030 18 d5 21 79 74 59 ef b1 fa d9 6f 0d 22 e3 1e df |..!ytY....o."...| 00000040 00 60 f6 a7 82 34 b8 01 |.`...4..| }, ([]uint8) (len=71 cap=71) { 00000000 30 44 02 20 67 bd 3c b8 62 1c 23 f1 de 4e 34 b0 |0D. g.<.b.#..N4.| 00000010 b5 f6 9c c2 1e db 7f 04 50 a9 19 5d b4 7a fa 5a |........P..].z.Z| 00000020 51 2a 3b 2e 02 20 25 4e 21 47 a4 87 c3 b7 b8 1b |Q;.. %N!G......| 00000030 60 76 ff 2b 73 ea 4b 71 8d 8f 54 79 08 99 71 da |`v.+s.Kq..Ty..q.| 00000040 10 8e c3 7a 83 31 01 |...z.1.| }, ([]uint8) (len=71 cap=71) { 00000000 52 21 02 87 a1 07 e7 66 f3 5f 7f 2a 56 2e 3c 85 |R!.....f._.V.<.| 00000010 5b 85 24 2c 7b 15 23 e0 df e5 7e e8 94 35 61 d4 |[.$,{.#...~..5a.| 00000020 c8 47 e3 21 02 d7 a5 3e 6e e0 1a 6c f3 27 7b 58 |.G.!...>n..l.'{X| 00000030 95 a7 d5 7d dd dd 7f f5 89 8b 08 95 87 2a e4 a2 |...}...........| 00000040 55 6a 81 ff f3 52 ae |Uj...R.| } }, Sequence: (uint32) 2161320420 }) }, TxOut: ([]wire.TxOut) (len=2 cap=2) { (wire.TxOut)(0x113b4420)({ Value: (int64) 1453, PkScript: ([]uint8) (len=34 cap=34) { 00000000 00 20 d1 dc 91 04 72 3a da cc 99 06 c3 17 70 01 |. ....r:......p.| 00000010 b1 ce 82 fb 99 d5 02 e2 45 44 f6 eb 05 ec 65 90 |........ED....e.| 00000020 67 81 |g.| } }), (*wire.TxOut)(0x113b4434)({ Value: (int64) 97823, PkScript: ([]uint8) (len=22 cap=22) { 00000000 00 14 da 80 df 85 57 c8 bc ce ac 81 2e 99 86 4e |......W........N| 00000010 fc 69 b5 72 ae 99 |.i.r..| } }) }, LockTime: (uint32) 548398034 })

2018-07-24 22:54:26.738 [INF] LNWL: Removed conflicting tx: (wire.MsgTx)(0x11a16780)({ Version: (int32) 2, TxIn: ([]wire.TxIn) (len=1 cap=1) { (wire.TxIn)(0x11558a80)({ PreviousOutPoint: (wire.OutPoint) 002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1, SignatureScript: ([]uint8) { }, Witness: (wire.TxWitness) (len=4 cap=4) { ([]uint8) { }, ([]uint8) (len=71 cap=71) { 00000000 30 44 02 20 1b 2a 44 be 4b 18 93 57 38 13 2a ee |0D. .D.K..W8..| 00000010 f2 9d 12 a1 cb e1 70 dc 04 02 f0 2d 23 a5 9a 2d |......p....-#..-| 00000020 e3 c5 35 8b 02 20 3d 23 da 56 72 18 f8 35 a1 b0 |..5.. =#.Vr..5..| 00000030 03 14 b0 60 72 f6 7f 56 1c 04 52 9a 37 1d e7 a7 |...r..V..R.7...| 00000040 11 49 e1 64 bf 44 01 |.I.d.D.| }, ([]uint8) (len=71 cap=71) { 00000000 30 44 02 20 49 c2 92 c2 f6 08 bb ee 81 53 da 50 |0D. I........S.P| 00000010 48 d9 1c 6b 11 b9 8f 67 33 60 dc cb 7e 51 42 98 |H..k...g3..~QB.| 00000020 a3 d5 19 3a 02 20 2c e9 93 b0 6f 66 6a 2f 47 11 |...:. ,...ofj/G.| 00000030 b2 0f 46 bf ac cb 8b c7 20 1b cb fb 66 7d 59 9c |..F..... ...f}Y.| 00000040 47 1c 3f bf 3f 22 01 |G.?.?".| }, ([]uint8) (len=71 cap=71) { 00000000 52 21 02 41 cf 33 78 34 d8 f2 39 cb 26 31 b3 bf |R!.A.3x4..9.&1..| 00000010 e9 1a e3 42 5c 16 1f f5 9e 46 a6 04 11 fd 69 6c |...B....F....il| 00000020 d9 ef 41 21 03 9d 5b 0e fd 89 dd bc 73 c0 b5 da |..A!..[.....s...| 00000030 85 b5 8d 44 6c 0b 1e 68 b1 07 f6 78 e6 02 09 e9 |...Dl..h...x....| 00000040 d3 9a b7 05 73 52 ae |....sR.| } }, Sequence: (uint32) 2152356151 }) }, TxOut: ([]wire.TxOut) (len=1 cap=1) { (*wire.TxOut)(0x118bc3a0)({ Value: (int64) 198552, PkScript: ([]uint8) (len=22 cap=22) { 00000000 00 14 b3 c3 4e c9 13 35 fe 37 1d 3c b9 70 a6 6e |....N..5.7.<.p.n| 00000010 44 80 70 c0 d6 35 |D.p..5| } }) }, LockTime: (uint32) 549507824 })

It attempts to continue, does its things, until: 2018-07-24 22:54:40.610 [INF] NTFN: New spend subscription: utxo=ad01a921838dafbbf674efa6952a2c6db6ce128d31d36b5734a7097a5f372c28:1 2018-07-24 22:54:40.610 [INF] CNCT: Close observer for ChannelPoint(fe3ae2f5746c220bfb61d205150a12a6da1c16b8deda1d1b17fb701a08d506b8:0) active 2018-07-24 22:55:02.357 [INF] RPCS: [listchannels] fetched 69 channels from DB 2018-07-24 23:00:02.685 [INF] RPCS: [listchannels] fetched 69 channels from DB 2018-07-24 23:03:58.243 [INF] LTND: Received shutdown request. 2018-07-24 23:03:58.243 [INF] LTND: Shutting down... 2018-07-24 23:03:58.243 [INF] LTND: Gracefully shutting down.

(more restart, I still suspect mutexes or such).

So, it gets stuck here, searching for ad01a921838dafbbf674efa6952a2c6db6ce128d31d36b5734a7097a5f372c28:

2018-07-25 00:05:56.952 [INF] NTFN: New spend subscription: utxo=ad01a921838dafbbf674efa6952a2c6db6ce128d31d36b5734a7097a5f372c28:1 2018-07-25 00:08:05.000 [INF] NTFN: New block: height=533496, sha=000000000000000000203bab50bde478d9751b265e08939e1dd4b1c3ed6c7b0e 2018-07-25 00:08:05.001 [INF] UTXN: Attempting to graduate height=533496: num_kids=0, num_babies=0

I let it run, it does process blocks, listchannels, etc, and eventually:

018-07-25 01:29:46.367 [INF] NTFN: Dispatching confirmed spend notification for outpoint=ad01a921838dafbbf674efa6952a2c6db6ce128d31d36b5734a7097a5f372c28:1 at height 533491

Continues, until: 2018-07-25 01:29:46.892 [INF] NTFN: New spend subscription: utxo=00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0

Now I do a bit digging, finds out that one TX has already been confirmed at 01:29, and decides that "oh! when it hangs, it is actually scanning old blocks!", so: 2018-07-25 01:40:50.546 [INF] NTFN: New spend subscription: utxo=002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1

On this one, I let it continue, and....: 2018-07-25 02:50:42.769 [INF] NTFN: Dispatching confirmed spend notification for outpoint=002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1 at height 533488 2018-07-25 02:50:42.776 [INF] NTFN: New spend subscription: utxo=655c8bbffc7a7ff49f8b35652d3aceb643cc8dea444f43e5e5a1a31dbabf7121:0 2018-07-25 02:50:42.781 [INF] CNCT: Local unilateral close of ChannelPoint(002288803fe566290aa5551ee8bd2d8c0e737b5e3aeb6d476f5930c36567c0ca:1) detected

It starts up more chainwatchers, until: 2018-07-25 02:50:45.689 [INF] NTFN: New spend subscription: utxo=00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0 2018-07-25 02:50:45.689 [INF] CNCT: Close observer for ChannelPoint(e77750c9c6a05f85e01c7394be43370e514fd57330432a5b91451eaa220c844c:0) active

.... 2018-07-25 04:23:23.327 [INF] UTXN: Attempting to graduate height=533525: num_kids=0, num_babies=0 2018-07-25 04:23:23.344 [INF] HSWC: Garbage collected 20 shared secret hashes at height=533525 2018-07-25 04:24:07.145 [INF] NTFN: Dispatching confirmed spend notification for outpoint=00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0 at height 533489 2018-07-25 04:24:07.152 [INF] NTFN: New spend subscription: utxo=d0c3d58f671544401babd9fb223801e40a3f6105589871f6c490a2d614ba173d:1 2018-07-25 04:24:07.150 [INF] CNCT: Close observer for ChannelPoint(00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0) active 2018-07-25 04:24:07.160 [INF] CNCT: Local unilateral close of ChannelPoint(00ed181d67cacc3ebc8a9e0f3775ded4aa21c60727528d5d12a420b31dfab4d8:0) detected 2018-07-25 04:24:07.256 [INF] NTFN: New spend subscription: utxo=4763584bd171ba12662be02c8262f0bd200436423fc0c29e2551a145e7fcd5bd:0 2018-07-25 04:24:07.256 [INF] CNCT: Close observer for ChannelPoint(d0c3d58f671544401babd9fb223801e40a3f6105589871f6c490a2d614ba173d:1) active 2018-07-25 04:24:07.329 [INF] NTFN: New spend subscription: utxo=6235e14b754cab674dd1aa3608778bc23ab33b5e0e23ea32a905bae31cdbfe27:1 2018-07-25 04:24:07.329 [INF] CNCT: Close observer for ChannelPoint(4763584bd171ba12662be02c8262f0bd200436423fc0c29e2551a145e7fcd5bd:0) active 2018-07-25 04:24:07.430 [INF] CNCT: Close observer for ChannelPoint(6235e14b754cab674dd1aa3608778bc23ab33b5e0e23ea32a905bae31cdbfe27:1) active 2018-07-25 04:24:07.430 [INF] NTFN: New spend subscription: utxo=afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0

.....

018-07-25 05:48:19.452 [INF] CNCT: Close observer for ChannelPoint(afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0) active 2018-07-25 05:48:19.446 [INF] NTFN: Dispatching confirmed spend notification for outpoint=afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0 at height 533490 2018-07-25 05:48:19.453 [INF] NTFN: New spend subscription: utxo=277314df2d97f6288483181d6a4130b2e484f914d6a581acc8cc014402029dc4:0 2018-07-25 05:48:19.459 [INF] CNCT: Local unilateral close of ChannelPoint(afceff0e9a1a9875460e0cb819155e97bcd0f37337b539b35983b6a04774bf75:0) detected 2

In the morning, I try to restart to get more debug info, but it's not very talkative. Maybe I was to tired. Not more than info in my logs.

It hangs at: 2018-07-25 06:50:53.094 [INF] NTFN: New spend subscription: utxo=f8786cfba4eebb2e1f7e041e6e5260e15b8c1b1a921b45651e9163bc14fecbc8:0

until:

2018-07-25 08:19:17.604 [INF] NTFN: Dispatching confirmed spend notification for outpoint=f8786cfba4eebb2e1f7e041e6e5260e15b8c1b1a921b45651e9163bc14fecbc8:0 at height 533489 2018-07-25 08:19:17.609 [INF] NTFN: New spend subscription: utxo=d2b8b062499288e3f3dba9a842cfacf670faba05f818aa3085b2f8f07279e69a:0 2018-07-25 08:19:17.609 [INF] CNCT: Close observer for ChannelPoint(f8786cfba4eebb2e1f7e041e6e5260e15b8c1b1a921b45651e9163bc14fecbc8:0) active 2018-07-25 08:19:17.613 [INF] CNCT: Local unilateral close of ChannelPoint(f8786cfba4eebb2e1f7e041e6e5260e15b8c1b1a921b45651e9163bc14fecbc8:0) detected

And it finally continues a normal startup.

But now, all these 4 force-closes is in an eternal pending force-close. Doing a bit of thinking...can it be that it doesn't actually get the N confirmations signal because it's way past that, by now?

Attaching the log from my adventure.

lnd.log.359.gz

wpaulino commented 6 years ago

@vegardengen the reason for the hang at startup has now been resolved on master. Could you provide the relevant logs for the pending force close channels on a new start?

cfromknecht commented 6 years ago

@vegardengen thanks for the additional info! I've looked through the logs and determined the source of the error. It looks like we missed a handoff internally in the CNCT, which after a restart doesn't transfer the contract into the UTXN. It may be difficult to resume sweeping of the channel, but I'll see what I can do.

vegardengen commented 6 years ago

Do you still need logs for a new startup?

halseth commented 6 years ago

@vegardengen Think we got what we need for now, working on a fix 👍 Thanks!

vegardengen commented 6 years ago

Not sure I had the same issue as the reporter. Should verify this before closing?

halseth commented 6 years ago

Will need #1696 to go in before close.

slush0 commented 6 years ago

@Roasbeef Unfortunately, I'm running recent git master and I still see this transaction hanging. Closing transaction has over 12000 confirmations. Maybe it is different problem, should I open separate issue?

        {
            "channel": {
                "remote_node_pub": "02198630153bc64ca60847abefac0fb08cbe2d4a59a00c8a5e62460db039437e08",
                "channel_point": "09cc4c277d935812669d94599d9c0f9bfc1d223f63edd8c3b09ff06d6e8deeb1:0",
                "capacity": "16777215",
                "local_balance": "16776310",
                "remote_balance": "0"
            },
            "closing_txid": "c68a08a9980bdbdcfa23283c3a8eadc2bba45e023d7a0f03c4efc9cf01fc7183",
            "limbo_balance": "16776310",
            "maturity_height": 531400,
            "blocks_til_maturity": -10295,
            "recovered_balance": "0",
            "pending_htlcs": [
            ]
        }
Roasbeef commented 6 years ago

@slush0 is that an existing channel? If it's an older one and it had the fee insufficient issue, then a way to get it mined would be to lower the min fee settings on your node, then attempt to use like an API to get it broadcast, or possibly wait for priority to kick in (if that's still a thing?).

We also have this PR that didn't quite make 0.5: https://github.com/lightningnetwork/lnd/pull/1704. It'll detect this case, and re-sign until accepted by the p2p network or full node, depending on which backend is being used.

halseth commented 5 years ago

Looks like the closing transaction is confirmed, and is just waiting to be swept. Do you have any logs at startup indicating whether the utxonursery is aware of the channel point?