Closed Zetanova closed 1 week ago
The check you mention will only log a warning if there are not sufficient outbound peers. It will never cause a shutdown. So there must be something wrong with the call to the backend.
thanks, then I will investigate more
It happened multiple times since v0.18, before near never (0-2y)
Hmm, weird. The only thing I can imagine (that is new in 0.18) is that the call to GetPeerInfo
itself fails for some reason. But I think any errors from that should be logged. What's your log level?
You should see a message that starts with INF: Health check:
that contains the actual error.
debuglevel=info,WTCL=warn
...
2024-06-15 21:39:40.807 [INF] HLCK: Health check: chain backend, call: 1 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-15 22:07:40.807 [INF] HLCK: Health check: chain backend, call: 1 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-15 22:12:40.808 [INF] HLCK: Health check: chain backend, call: 1 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-15 22:15:10.875 [INF] HLCK: Health check: chain backend, call: 2 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-15 22:17:40.881 [CRT] SRVR: Health check: chain backend failed after 3 calls
Okay, so the actual connection timed out. Did you update to bitcoind v0.27
at the same time as you updated to lnd v0.18.0
? Maybe something in bitcoind changed in relation to being offline?
I updated the bitcoind
only after lnd 0.18
and the problems started.
The node itself is running already over 3y (started on 2x rsp3)
I monitoring system resources manually but all seams fine System has Free Memory and Low CPU
The bitcoind logs do not have any special errors/warning
As mentioned above, we won't shutdown based on this message. Instead, it's a warning.
I tested now with a manual disconnect of 65sec outgoing peers decreased to 7 after the reconnect no special log entries
I will monitor longer and write if I find the issue.
The node itself never manage to stay longer then 120h online after the last update.
... It already shutdown yesterday
the command docker logs lnd -f --tail 100 | grep "Health check"
didn't print a nextline+cd
2024-06-26 16:41:41.012 [INF] HLCK: Health check: chain backend, call: 1 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-26 16:44:11.022 [INF] HLCK: Health check: chain backend, call: 2 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-26 18:17:40.994 [INF] HLCK: Health check: chain backend, call: 1 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-26 18:20:10.996 [INF] HLCK: Health check: chain backend, call: 2 failed with: health check: chain backend timed out after: 30s, backing off for: 2m0s
2024-06-26 18:22:41.014 [CRT] SRVR: Health check: chain backend failed after 3 calls
I found the issue, a now chatty process storagenode from storj suddenly burst in CPU usage over minutes.
thx for help
I investigated now a bit more, cpu limited the storj process and increased the bitcoind dbcache
What I discovered is that bitcoind uses high amount of cpu on a low block download
at this period the $ time bitcoin-cli -netinfo
is very lagy with probs: 1s, 11s, 29s
When the bitcoind does a tip verification the $ time bitcoin-cli -netinfo
goes even over 1m with probs: 1m24s, 2m10s
@guggero
Can the getPeers API call be the reason for the health check: chain backend timed out after: 30s
message on the lnd?
Can the getPeers API call be the reason for the health check: chain backend timed out after: 30s message on the lnd?
That's what I suspect too. But if that call is slow on bitcoind
, maybe it's a sign that you shouldn't cram even more things onto a piece of hardware that's barely able to keep up with just Bitcoin stuff? Maybe StorJ is completely hogging the disk as well, which is slowing down block verification.
I found the real issue, its the fragmentation level of the hdd seq read/write speed on the idle rsp4 are at ~95MB/s
bitcoind made 10MB/s read, but this was already the max in random access.
@guggero I know this is limited hardware, but is in the req. specs.
Background
The new health check to test the min amount of btc node outbound peers is currently set to 6 https://github.com/lightningnetwork/lnd/blob/ce813276d55e07887c1889a745856ae759c5d39b/chainreg/chainregistry.go#L890C6-L890C24
With the default configuration bitcoind will connect to max 10 outbound peers. If they internet gets briefly interrupted then the bitcoind will not reconnect to 6 outbound peers fast enough, to not trigger the lnd shutdown.
This is only a assumption, a proper shutdown reason inside the log is missing. The message
Health check: chain backend failed after 3 calls
is insufficient. Thebitcoind
is running on the same host/docker fine and responsive.Your environment
lnd v0.18.0-beta
Linux b5c28fb40412 5.15.0-1050-raspi
bitcoind v27.0.0
Steps to reproduce
Run lnd and bitcoind with near default configuration and disconnect the internet for 30sec-1min.
Expected behaviour
Actual behaviour
The lnd shutdowns itself as if it lost connection to the chain backend