influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.13k stars 5.51k forks source link

Redis Cluster Node Discovery #15199

Open jeremycampbell-okta opened 2 months ago

jeremycampbell-okta commented 2 months ago

Use Case

A Redis cluster node info provides metrics that are mostly specific to itself, not the cluster. To get a perspective on the whole cluster, you must collect info results from each node in the cluster.

This is trivial in a e.g. EC2/self-hosted Redis where you can install the Telegraf agent locally and have the Redis plugin connect to localhost.

It is challenging, however, when using a Managed Redis cluster such as AWS ElasiCache or GCP Memorystore, where you're presented with a primary endpoint, and the cluster members will change potentially dynamically as the cluster scales in/out.

The desired feature is for the Telegraf Redis plug-in have a node discovery feature. Similar to a Redis client, the initial connection will perform a cluster nodes command to understand the cluster topology:

10.64.82.45:11006> cluster nodes
4ce37a099986f2d0465955e2e66937d6893aa0e1 10.64.82.45:11006@16379 myself,master - 0 1713471635000 5 connected 5462-10922
d6eb119a1f050982cc901ae663e7448867e49f7c 10.64.82.46:11005@16379 master - 0 1713471639016 4 connected 10923-16383
3a386fb6930d8f6c1a6536082071eb2f32590d31 10.64.82.46:11007@16379 master - 0 1713471638009 6 connected 0-5461

It will then connect to each node to collect the info or custom command metrics.

Expected behavior

With the following example configuration, the Redis plug-in will connect to the initially provided server and perform node/topology discovery. It will then connect to each node in the cluster to collect metrics.

[[inputs.redis]]
  servers = ["tcp://10.64.82.45:11006"]
  node_discovery = true

This will result in effectively the following configuration:

servers = ["tcp://10.64.82.45:11006","10.64.82.46:11005","10.64.82.46:11007"]

and metrics will be pulled for each node.

tls configuration, authentication, custom commands, etc will all be assumed to be the same for each discovered node.

Actual behavior

Currently, the following configuration:

[[inputs.redis]]
  servers = ["tcp://10.64.82.45:11006"]

will only collect metrics for the specified node.

Additional info

No response

powersj commented 2 months ago

Thanks for the issue. Are you willing to put up a PR for this?