latency seems high for /metric endpoint

miketwenty1 commented 2 years ago

on both v0.13.3-beta and v0.14.1-beta if nodes aren't compacted often it seems that latency starts creeping up for the /metrics endpoint. It gets pretty bad, over 10 seconds. My scrape interval is 15s and my timeout is 10s, so when nodes start to fail to return after 10s, I know it's time to restart the node.

Is there something wrong with LND / LNDMON / or my setup?

I would imagine a prometheus metric endpoint would take less than 1s to return data on average. Even after restart/compacting the node it still takes several seconds to retrieve data from /metrics

Roasbeef commented 2 years ago

Optimizations are certainly possible to speed things up or even modify the way we export data in general. Right now we just read everything then export over, things could be made more intelligent to only export things that have changed. On my personal nodes I've increased the scrape interval since 15 seconds is pretty short and none of the current default queries use counters to that granularity.

As always, pull requests to optimize the way we export data!

miketwenty1 commented 2 years ago

"none of the current default queries use counters to that granularity."

What is the granularity of the counters? I can just set the scrape interval to that, I suppose.

lightninglabs / lndmon

latency seems high for /metric endpoint #77