Closed shaotingcheng closed 4 years ago
Yeah, this unfortunately makes this exporter pretty unusable.
We have hundreds to thousands of users connected at a time, so enabling this exporter in production would immediately DOS the Prometheus.
https://prometheus.io/docs/practices/instrumentation/#things-to-watch-out-for has a useful overview of why adding all of this high-cardinality data is bad.
I think the best thing to do here would be to have an "-individual-users" command line option or similar, like collectd does: https://collectd.org/wiki/index.php/Plugin:OpenVPN
We're happy to accept a PR for that :)
An alternative of a command line option would be a HTTP parameter similar to what the node_exporter provides: https://github.com/prometheus/node_exporter#filtering-enabled-collectors That would allow users to disable collecting user metrics in their Prometheus server, while allowing other clients (e.g. web clients) to take a look at all stats.
@timstoop have created a PR here - https://github.com/kumina/openvpn_exporter/pull/24
Perhaps it would make sense to expose the per-client connection timestamp as an actual metric instead of or in addition to this ask? Aggregate and per-user uptime is something I do actually want visibility into, but unpacking a label's value and transforming it into a metric (what I'll likely do today) doesn't seem like a fantastic pattern.
This should be solved in #31 .
The biggest antipattern with any timeseries storage system is having a hugely growing label cardinality. Each new combinaison of label values will grant its own timeseries to store data. The number of said files can grow exponentially and have a huge impact on the health of prometheus.
common_name: will grow for each new user we have. Not best practice but this is "acceptable" if we don't have many many users and don't have much turnover
connection_time: this is very very very very very BAD practice as this will generate a new value for each reconnection to the VPN. Storing timestamps as label on prometheus is the absolute worst thing one can do to an exporter. This garantees that this label's cardinality will explose within days or even hours. It could bring down the whole monitoring system, depending on the load.
real_address: This is really not great as the address can change, but this is spectacularly bad as it includes the source port, which is random for each connection. This makes this label as bad and dangerous as the connection_time label.