lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.64k stars 2.08k forks source link

[bug]: high cpu usage #7675

Closed C-Otto closed 1 year ago

C-Otto commented 1 year ago

Background

My 0.16.2 node started using lots of CPU (100% of one core consistently), which is unusual behavior. Profile available at https://c-otto.de/lnd/profile

Please let me know if you need any configuration details or anything from the logs.

cpu_1d (~80% baseline is normal, the recent increase to ~180% is due to this bug)

Your environment

C-Otto commented 1 year ago

It stopped, at the same time a constant flow of forwarded transactions stopped. I'm using https://github.com/lightningequipment/circuitbreaker which rate-limited one of my peers and put pending requests into a queue. This didn't cause any CPU related issues right away, though.

C-Otto commented 1 year ago

It happened again (more than once). Another profile: https://c-otto.de/lnd/profile2

bitromortac commented 1 year ago

Thank you for the profiles, the main culprit seems to be related to these calls for both profiles:

$ go tool pprof profile2
(pprof) focus=mempoolPoller
(pprof) top10
Active filters:
   focus=mempoolPoller
Showing nodes accounting for 29.66s, 81.66% of 36.32s total
Dropped 5 nodes (cum <= 0.18s)
Showing top 10 nodes out of 12
      flat  flat%   sum%        cum   cum%
    14.90s 41.02% 41.02%     18.19s 50.08%  runtime.mapiternext
     9.93s 27.34% 68.36%     29.83s 82.13%  github.com/btcsuite/btcwallet/chain.(*mempool).removeInputs (inline)
     1.66s  4.57% 72.94%      1.72s  4.74%  runtime.(*bmap).overflow (inline)
     1.06s  2.92% 75.85%      1.06s  2.92%  memeqbody
     0.89s  2.45% 78.30%      0.89s  2.45%  runtime.add (inline)
     0.58s  1.60% 79.90%      0.58s  1.60%  runtime.isEmpty (inline)
     0.33s  0.91% 80.81%      1.70s  4.68%  github.com/btcsuite/btcd/chaincfg/chainhash.(*Hash).IsEqual (inline)
     0.31s  0.85% 81.66%      0.31s  0.85%  runtime.memequal
         0     0% 81.66%     29.83s 82.13%  github.com/btcsuite/btcwallet/chain.(*bitcoindZMQEvents).mempoolPoller
         0     0% 81.66%     29.83s 82.13%  github.com/btcsuite/btcwallet/chain.(*mempool).DeleteUnmarked

So it's related to some mempool polling inefficiencies. What is the size of your mempool? I also saw some CPU consumption regarding force closes, do you have active force closes?

C-Otto commented 1 year ago

My bitcoind is configured with maxmempool=2048 and prune=10240. There are three pending force closes, one with a confirmed TX, two stuck in the mempool.

bitromortac commented 1 year ago

Do you have older traces as well? Would be curious to see the behavior before v0.16.1. Wonder if those plateaus are triggered by blocks. Your mempool is quite large so perhaps inefficiencies are amplified and surfaced for you. Will check other node's performance patterns.

Also, there's a relevant comment: https://github.com/btcsuite/btcwallet/blob/68f7e23dca2742b98d13d6bd8e9340047bc6aa9e/chain/mempool.go#LL188-LL194

C-Otto commented 1 year ago

I don't have any other recent profiling data. I'm pretty sure the CPU load goes up when a new block arrives, yes. I also see short-lived spikes at other times, though.

Roasbeef commented 1 year ago

If you want to decrease or smooth out the CPU usage, you can increase this value: bitcoind.txpollinginterval. The default is 1m, if you increase it further it'll try to reconcile less often.

We added this so we can react to HTLC preimages much faster (after they're in the mempool vs in the chain). There's a large txn back log rn, so it's pretty helpful in times like this. This new behavior can help to prevent force closes. The interval can be increased to poll the mempol less often (to find new txns and remove old ones).

yyforyongyu commented 1 year ago

Should be fixed by #7681

dannydeezy commented 1 year ago

i'm having this issue now.. our node is becoming unusable at times