Closed sangaman closed 6 years ago
If you look at the logs for Mission Control
, do you see any lingering activity? Atm, we don't have an upper bound on our number of attempts, which itself is a bug. If your machine is slow, or is doing a ton of other activities, then with the way our timeouts are considered, it may prune an edge, attempt another payment, then by the time the payment fails, that edge is unpruned again.
I don't see Mission Control
in the logs but I do see multiple lines of Attempt to send payment
over ~2 minutes or so, I think that's about how long I waited before killing the command . One machine I observed this on was slow, the other was using modern hardware. If I see this again I'll let it run for a while and keep a close eye on the logs.
I have been getting sending/routing hangs as well. I hadn't correlated it to a funding issue, but for example i sent small payments to htlc.me which worked then shortly after a larger payment which hangs. Subsequent attempts with lower amounts are also hanging until a service restart. queryroutes
often hangs as well even with small amounts. I'm running this on a relatively slow machine also.
Update: queryroutes
did work with small amounts but took half an hour and returned 100s of results!
queryroutes
and sendpayment
are doing path finding a bit differently, #379 should fix a lot of the slowness seen in queryroutes
.
sendpayment
is using MissionControl
, which should show up in the logs with prefix CRTR:
if you are using debug logs :)
@halseth good to know. I do see all the Attempt to send payment
messages are prefaced with CRTR:
. I haven't encountered this issue again yet but I'll update if I do.
Is this the same as this one? Totally reproducible here right now, will try not to change anything.
2018-01-07 12:43:01.441 [DBG] PEER: Sending UpdateAddHTLC(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, id=1274, amt=1910190 mSAT, expiry=1257881, hash=9e9405921e783ce87bf20f48d368d2bde55d61bf9e36248e5a87a72d171a3594) to 34.250.234.192:9735 2018-01-07 12:43:01.496 [DBG] PEER: Sending CommitSig(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, num_htlcs=0) to 34.250.234.192:9735 2018-01-07 12:43:01.517 [DBG] PEER: Received RevokeAndAck(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, rev=86e4e4f8b4375f8ca98079d1ad0837a2415c8b33dd247abc76c41d38de12861a, next_point=0373849f08f1aadf301eb419bd8fe624a8e1b1abedacead748154ea1f828f19242) from 34.250.234.192:9735 2018-01-07 12:43:01.532 [DBG] PEER: Received CommitSig(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, num_htlcs=0) from 34.250.234.192:9735 2018-01-07 12:43:01.542 [DBG] HSWC: ChannelPoint(30a1cb92dd366b90b78e32e6f9ac5ea5ac4134e3a74e613c9db3a8bc406a088c:0) forwarding 0 HTLC's 2018-01-07 12:43:01.551 [DBG] PEER: Sending RevokeAndAck(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, rev=b531d2b4195df9166a57afac02589ec0de7ac6f41f2cb0a8bb2477b2329a77ef, next_point=030ef265375207ee0869603c361607148d1e56d5a758632ff1af951326d478b7c6) to 34.250.234.192:9735 2018-01-07 12:43:01.571 [DBG] PEER: Received UpdateFailHTLC(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, id=1274, reason=20fc8dc09e23eff44ae3db4c74c02faca99c6bb8bb7321c5c476adecfae94c7e35cb60f132ecfb80c689f92845825c8192688b1c2bbf4e8ea5b4a617e27f871ea332a2a3e865ac11624a5ff04f6f479e7b65464ca6ab06100943e07ef0240dd49ab47a838f15aa1dc892c24f1e8134c39434a8ef039c2148670dabd1fcbde233fa7720cf4b850b81c2e27825511bac75c3531870733c31f837c82df4d352a8c2412b27fcdaaf328ed3a3d9a704c27b56ac7bb873aa6d579b6a0cdf8d5013f3737bb35ce3c53f53f19e3ca3b72ec11aa5ed9d9f02c113a23db91c4f456109c6967a363915a8b58eeacc790645fd8ed97c39ecbc50b33fd9cfa07b666d0ac64d92f5c25c819ba8aeff63ddee7ea1d9209c4a79aa64eef9f92efbf42da591404e91827fb26c) from 34.250.234.192:9735 2018-01-07 12:43:01.584 [DBG] PEER: Received CommitSig(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, num_htlcs=0) from 34.250.234.192:9735 2018-01-07 12:43:01.608 [DBG] PEER: Sending RevokeAndAck(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, rev=244f42d56d0436348840ad47931e6b0d4d2742694c4c22a68807db7c27ad6dc8, next_point=039699c265f40e3f32e6e503b87dd475dafc3a5a49097c05a9f28d915246b979ec) to 34.250.234.192:9735 2018-01-07 12:43:01.616 [DBG] PEER: Sending CommitSig(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, num_htlcs=0) to 34.250.234.192:9735 2018-01-07 12:43:01.640 [DBG] PEER: Received RevokeAndAck(chan_id=8c086a40bca8b39d3c614ea7e33441aca55eacf9e6328eb7906b36dd92cba130, rev=90c5e281bda7036d10e1dc6aaced454eac2632b098f5bfa0898d23e508bfd1e1, next_point=02eda842819df81919ca6651ad9c6397051790d5c589d6c6009bd6c623de9d1f72) from 34.250.234.192:9735 2018-01-07 12:43:01.650 [DBG] HSWC: ChannelPoint(30a1cb92dd366b90b78e32e6f9ac5ea5ac4134e3a74e613c9db3a8bc406a088c:0) forwarding 1 HTLC's 2018-01-07 12:43:01.650 [DBG] HSWC: Closed completed onion circuit for 9e9405921e783ce87bf20f48d368d2bde55d61bf9e36248e5a87a72d171a3594: (0:0:0, 3294) <-> (1256433:7:0, 1274) 2018-01-07 12:43:01.652 [ERR] HSWC: unable to de-obfuscate onion failure, htlc with hash(9e9405921e783ce87bf20f48d368d2bde55d61bf9e36248e5a87a72d171a3594): unable to decode error update (errors.errorString): TemporaryChannelFailure(update=(lnwire.ChannelUpdate)(0xc42079d3b0)({ Signature: (btcec.Signature)(0xc420bbf6e0)({ R: (big.Int)(0xc420b33320)(48832811375496015207959661304449105467071572793175724307506166905292457253441), S: (big.Int)(0xc420b33340)(109351511127532896365573795571033297586969600504322976730782758616365818856265) }), ChainHash: (chainhash.Hash) (len=32 cap=32) 2913000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f, ShortChannelID: (lnwire.ShortChannelID) 13238272:6422528:23121, Timestamp: (uint32) 467140609, Flags: (lnwire.ChanUpdateFlag) 144, TimeLockDelta: (uint16) 0, HtlcMinimumMsat: (lnwire.MilliSatoshi) 655360000 mSAT, BaseFee: (uint32) 655360000, FeeRate: (uint32) 0 }) ) 2018-01-07 12:43:01.652 [ERR] CRTR: Attempt to send payment 9e9405921e783ce87bf20f48d368d2bde55d61bf9e36248e5a87a72d171a3594 failed: TemporaryChannelFailure: unable to de-obfuscate onion failure, htlc with hash(9e9405921e783ce87bf20f48d368d2bde55d61bf9e36248e5a87a72d171a3594): unable to decode error update (errors.errorString): TemporaryChannelFailure(update=(lnwire.ChannelUpdate)(0xc42079d3b0)({ Signature: (btcec.Signature)(0xc420bbf6e0)({ R: (big.Int)(0xc420b33320)(48832811375496015207959661304449105467071572793175724307506166905292457253441), S: (big.Int)(0xc420b33340)(109351511127532896365573795571033297586969600504322976730782758616365818856265) }), ChainHash: (chainhash.Hash) (len=32 cap=32) 2913000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f, ShortChannelID: (lnwire.ShortChannelID) 13238272:6422528:23121, Timestamp: (uint32) 467140609, Flags: (lnwire.ChanUpdateFlag) 144, TimeLockDelta: (uint16) 0, HtlcMinimumMsat: (lnwire.MilliSatoshi) 655360000 mSAT, BaseFee: (uint32) 655360000, FeeRate: (uint32) 0 }) ) 2018-01-07 12:43:01.652 [DBG] CRTR: Reporting edge 1381462693021941760 failure to Mission Control 2018-01-07 12:43:01.652 [DBG] CRTR: Mission Control returning prune view of 3 edges, 7 vertexes
@vegardengen Could you paste more of the logs following this? (you can use pastebin or something) Thanks!
I think this is a complete cycle. But ok, I'll put more. From start of payment until now. lnd.log
btw, this one I catched only because I have a cron script rsyncing the log directory...3 old log files is really not enough. Should be possible to tune or turn off old log pruning! When testing, you really like to be able to go back in time - days at least!
Was this log the result of trying to send the payment only once, or did you try several times?
From the logs it looks like it starts looking for a path for the payment, then keeps on doing this for several minutes. Would be very useful if you could try paying to a new invoice, and only trying it once while capturing the logs! Let it try for a while to see if it finishes :)
I tried doing it twice after a while, but only after minutes.
Can I easily cancel this loop? Will it disappear if I restart?
It is still looping from that same payment?
Yeah, try restarting, then see if you can reproduce with a new invoice.
Restarted, new invoice in loop...how long would you like me to wait?
give it ~5 minutes, then paste the logs 👍
What hardware are you running this on?
Running it on a digitalocean VPS. 3 GB memory, no CPU shortage. Even with this loop, there is still some idle CPU left....(even if lnd eats most of CPU here)
The VPS is dedicated to the lightning node.
It looks like route finding take so much time that pruned edges (that we try send a payment through, but fails) gets "unpruned" between attempts. Looks like we insert a pruned edge back into the graph after 5 seconds. Hence it will never converge on a graph where no routes are left to try. (roasbeef mentioned this scenario earlier)
If disk access is slow, I think that would be the main bottleneck.
@Roasbeef would know this better than I do.
Maybe have a counter for the edges, that says second time an edge is pruned, it'll have to wait 10 seconds?
...and so on?
and then decrease that again over time.
I've observed this behavior on multiple nodes so it isn't isolated. Calling
lncli payinvoice
orlncli sendpayment --pay_req
when channel capacity is insufficient causes the commands to hang.From the receiving node:
Then trying to pay the invoice on the sending node using one of the commands above then causes the console to hang. I grabbed the following from the
lnd
output: