lightninglabs / lndmon

🔎lndmon: A drop-in monitoring solution for your lnd node using Prometheus+Grafana
MIT License
151 stars 48 forks source link

lnd_peer_count of type counter instead of gauge #85

Open bindermuehle opened 2 years ago

bindermuehle commented 2 years ago

I'm trying to setup a dashboard for our lnd node with lndmon and I am running into an issue that the lnd_peer_count metric type is a counter instead of a gauge.

To me it seems this value can go and and down and thus a gauge is better suited.

Thanks for clarifying!

https://github.com/lightninglabs/lndmon/blob/d0d214c2a24aa50ddd49d135af3a994c4432861e/collectors/peer_collector.go#L98

    ch <- prometheus.MustNewConstMetric(
        p.peerCountDesc, prometheus.CounterValue,
        float64(len(listPeersResp)),
    )
Roasbeef commented 2 years ago

IIUC, we added this as a count so we could compute things like the rate of peer flapping (connects/disconnects per second).

bindermuehle commented 2 years ago

I found the computation of flap rate calculation here: https://github.com/lightninglabs/lndmon/blob/d0d214c2a24aa50ddd49d135af3a994c4432861e/grafana/provisioning/dashboards/peers.json#L145

it is using the deriv function which according to prometheus documentation should only be used with gauges: https://prometheus.io/docs/prometheus/latest/querying/functions/#deriv

Roasbeef commented 2 years ago

it is using the deriv function which according to prometheus documentation should only be used with gauges:

Ahh, TIL!

We'd def accept a PR to either export a new gauge along side, then we can later deprecate the existing one.

bindermuehle commented 2 years ago

ok, I'm not that deep into the project but if I find time I'll take a look. Do you have a naming scheme for this scenario? I would suggest something like lnd_peer_count_new or lnd_peer_count_gauge