sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.84k stars 703 forks source link

Validator metric names #2977

Open arnetheduck opened 2 years ago

arnetheduck commented 2 years ago

Description

I'm looking into metric names in order to make future standardization easier of some validator monitoring metrics - it turns out some of the exposed metrics don't quite follow prometheus conventions.

Version

https://github.com/sigp/lighthouse/blob/5f628a71d4b2a7e7761b30e726338bba07617cd2/beacon_node/beacon_chain/src/metrics.rs#L514

Present Behaviour

_total is used freely.

Expected Behaviour

Counter metrics are generally expected to carry a _total suffix - gauge metrics generally should not (https://www.robustperception.io/on-the-naming-of-things)

It would be nice that this convention is adhered to - metrics libraries try to be clever with adding and removing _total from counters making it hard to support the non-standard names.

paulhauner commented 2 years ago

Yeah, sorry it's something we're a bit rough on at the moment.

Is the solution to this just removing the "_total" suffix from all metrics names?

ruuda commented 2 years ago

Lighthouse also doesn’t follow the Prometheus convention to namespace metrics. There might be two applications monitored by the same Prometheus instance that have a metric like async_tasks_count, and we wouldn’t be able to tell them apart easily. Therefore the recommendation is to add the name of the application as prefix, e.g. lighthouse_async_tasks_count.