influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.91k stars 5.6k forks source link

internal gnmi selfstat has source label different than gnmi subscription #15731

Closed protonmarco closed 3 months ago

protonmarco commented 3 months ago

Relevant telegraf.conf

[[inputs.gnmi]]
  alias = "my_hostname"
  ## Address and port of the gNMI GRPC server
  addresses = ["hostname:57400"]
  username = "telegraf"
  password = "${PASSWORD}"
  redial = "10s"
  tagexclude = ["path","host"]
  tls_enable = true
  tls_ca = "/etc/telegraf/ssl/ca-chain.pem"
  tls_cert = "/etc/telegraf/ssl/telegraf_cert.pem"
  tls_key = "/etc/telegraf/ssl/telegraf_key.pem"
  insecure_skip_verify = false

  [[inputs.gnmi.subscription]]
    name = "interfaces"
    path = "/interface[name=*]/oper-state"
    subscription_mode = "sample"
    sample_interval = "10s"
 [[outputs.prometheus_client]]
   ## Address to listen on
   listen = "0.0.0.0:9273"

   ## Path to publish the metrics on.
   path = "/metrics"

Logs from Telegraf

-

System info

1.32.2, rocky linux 8.10

Docker

No response

Steps to reproduce

  1. set up a gnmi connection with a subscription (interfaces in example, could be anything) and a prometheus output
  2. looking at prometheus /metrics, all the gnmi metrics have label source="hostname" while internal_gnmi_grpc_connection_status has source="hostname:57400"

Expected behavior

I would expect for internal_gnmi_grpc_connection_status to have a source label same as the gnmi metrics (no processors are in place for it currently), so without the port specified.

Actual behavior

the internal metric has the port defined in the source label

internal_gnmi_grpc_connection_status{source="hostname:57400"}

Additional info

No response

powersj commented 3 months ago

Hi,

It appears we set that interval value using the handler's address:

https://github.com/influxdata/telegraf/blob/master/plugins/inputs/gnmi/handler.go#L94-L97

@srebhan thoughts on stripping the port?

srebhan commented 3 months ago

@powersj I think we should strip it but I'm not sure if this isn't causing regressions for people...

powersj commented 3 months ago

@protonmarco,

We chatted about this a bit today. I think the general consensus is for users to use a processor to modify the tag. Changing the tag could have an impact on existing users and keeping the port seems to make sense to avoid any user with multiple ports from the same host.

One possible processor option would look like:

[[processors.regex]]
  [[processors.regex.tags]]
    key = "source"
    pattern = "^(.*):.*$"
    replacement = "${1}"

to transform:

-- metric,source=host:8888 value=42 1723663884492728297
++ metric,source=host value=42 1723663968544286648
protonmarco commented 3 months ago

@powersj the provided processors works perfectly, thanks. Regarding the regression problem, for what it's worth, i would be surprised to see someone else other than me using it, as iirc this internal metric is not even documented outside of https://github.com/influxdata/telegraf/issues/13088

srebhan commented 3 months ago

@protonmarco we did have a few "nobody uses this" assumptions in the past and were almost always wrong. :-) So let's not break things for users! Are we good to close this issue?

protonmarco commented 3 months ago

Fair enough, I'm proceeding to close this one, thanks again :)