lightninglabs / lndmon

🔎lndmon: A drop-in monitoring solution for your lnd node using Prometheus+Grafana
MIT License
149 stars 47 forks source link

Recurring lndmon crashes post bitcoind node change #101

Closed onepabz closed 7 months ago

onepabz commented 7 months ago

We've been successfully running lndmon for a long time. Recently, we changed our bitcoind nodes, and since then, all our lndmon pods have been crashing every few hours. Lnd works fine and all the other lnd auxialiary services work fine, only lndmon keeps crashing.

Here's what I see in the logs:

Lndmon exiting with error: ChainCollector GetInfo failed with: rpc error: code = DeadlineExceeded desc = context deadline exceeded

We are using the latest lndmon version, v0.2.7.

Ive tried increasing prometheus scrape interval/timeout but lndmon keeps crashing

Any help would be much appreciated.

Roasbeef commented 7 months ago

Was the bitcoind node updated to a new version? If so, from which version to which?

Roasbeef commented 7 months ago

That error looks like it just isn't able to get info in time so it times out: https://github.com/lightninglabs/lndmon/blob/2d4e987b3f0414a3dfe51f05e672867e3257aeb0/collectors/chain_collector.go#L71-L76

Roasbeef commented 7 months ago

There's a timeout value here: https://github.com/lightninglabs/lndclient/blob/04c46b8af9172ca1355f9a1ee416368e97f0aa0d/lightning_client.go#L1330-L1331

So we can set that when we make the client: https://github.com/lightninglabs/lndclient/blob/04c46b8af9172ca1355f9a1ee416368e97f0aa0d/lnd_services.go#L295-L298

Bigger question here tho is: why is that bitcoind slower, or did lnd get slower?

onepabz commented 7 months ago

Was the bitcoind node updated to a new version? If so, from which version to which?

The only difference is the type of disk that the persistent volume in Kubernetes (GKE) uses underneath, switching from an SSD to a non-SSD one. What is weird is that both lnd and bitcoind appear to be functioning properly

Bigger question here tho is: why is that bitcoind slower, or did lnd get slower?

both appear to be working fine and no other consumers complaining...

onepabz commented 7 months ago

After using ssd disks again in the bitcoind pods, lndmon has stopped crashing , thanks four your help @Roasbeef