influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.89k stars 5.6k forks source link

(inputs.gnmi): Parsing field names incorrectly #14530

Closed greenfox878 closed 10 months ago

greenfox878 commented 10 months ago

Relevant telegraf.conf

[[inputs.gnmi]]
  addresses = ["gnmi_dut:xxx"]

  ## define credentials
  username = "xxxxx"
  password = "xxxxx"

  encoding = "json_ietf"
  tagexclude = ["path"]

  [inputs.gnmi.tags]
    test_tag = "test"

  # Define additional aliases to map encoding paths to measurement names
  [inputs.gnmi.aliases]
    origin = "openconfig"
    ifcounters = "/interfaces/interface/state/counters"

  [[inputs.gnmi.subscription]]
    name = "ifcounters"
    origin = "openconfig"
    path = "/interfaces/interface/state/counters"
    subscription_mode = "sample"
    sample_interval = "10s"

Logs from Telegraf

no error messages

System info

Telegraf 1.29.1, Ubuntu 22.04.3 LTS (latest updates)

Docker

No response

Steps to reproduce

  1. Download compiled telegraf release "wget https://dl.influxdata.com/telegraf/releases/telegraf-1.29.1_linux_amd64.tar.gz", untar
  2. Run with the command "./telegraf-1.29.1/usr/bin/telegraf --config ./test_telegraf_conifg.cfg --debug"

Expected behavior

Correct output. v1.28.5 or older

ifcounters,host=test_telegraf_host,name=Ethernet40,source=gnmi_dut,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264
ifcounters,host=test_telegraf_host,name=Ethernet26,source=gnmi_dut,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264

Actual behavior

Corrupted outptu. v1.29.1:

ifcounters,host=test_telegraf_host,name=Ethernet40,source=gnmi_dut,test_tag=test t_pkts=0i,in_errors=0i,rs=0i,in_octets=0i,in_pkts=0i,pkts=0i,st_pkts=0i,s=0i,out_errors=0i,out_octets=0i,out_pkts=0i,_pkts=0i 1704442117721474264
ifcounters,host=test_telegraf_host,name=Ethernet26,source=gnmi_dut,test_tag=test t_pkts=0i,in_errors=0i,rs=0i,in_octets=0i,in_pkts=0i,pkts=0i,st_pkts=0i,s=0i,out_errors=0i,out_octets=0i,out_pkts=0i,_pkts=0i 1704442117721474264

here we see "_pkts", "t_pkts", "st_pkts" instead of "out_pkts", "out_multicast_pkts"...

Additional info

No response

powersj commented 10 months ago

Hi @greenfox878,

In order to help resolve this can I request that you do the following:

  1. Enable debug in the agent. Either via the --debug flag on the CLI or add debug = true in the [agent] section of your config.
  2. In your gnmi input plugin config, add dump_responses = true
  3. Find and provide the full message that is getting truncated. Something like this example.

With that we can dig into the logic and see what is going on.

Thanks!

srebhan commented 10 months ago

@greenfox878 please use dump_responses = true instead of trace = True... ;-)

greenfox878 commented 10 months ago

Hi @powersj ,

2024-01-06T07:35:45Z D! [inputs.gnmi] Got update_1704442117721474264: {"update":{"timestamp":"1704442117721474264","prefix":{"elem":[{"name":"interfaces"},{"name":"interface","key":{"name":"Ethernet35"}},{"name":"state"},{"name":"counters"}]},"update":[{"path":{"elem":[{"name":"in-broadcast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-discards"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-fcs-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-multicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-octets"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"in-unicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-broadcast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-discards"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-errors"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-multicast-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-octets"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-pkts"}]},"val":{"uintVal":"0"}},{"path":{"elem":[{"name":"out-unicast-pkts"}]},"val":{"uintVal":"0"}}]}}
powersj commented 10 months ago

@greenfox878 thanks for that output. I am able to reproduce with our tests. Next steps are I need to understand the existing key parsing code. It appears in your case we want to take the path base of the key, but are instead, sometimes doing some sub-string logic here that is clearly wrong.

edit:

The alias path we get is openconfig:/interfaces/interface/state/counters, which in some cases is in fact shorter than the key, which is when we take the remainder of key's paths length + 1.

For example:

key:       /interfaces/interface/state/counters/in_unicast_pkts
aliasPath: openconfig:/interfaces/interface/state/counters
new key:   pkts

The alias path is looked by checking if the current path:

/interfaces/interface/state/counters/out-discards

is a subpath of:

openconfig:/interfaces/interface/state/counters

Which it is, so it returns the alias path and then does the strange substring.

I think we should be ensuring that the entire alias path exists in the string before we start doing this manipulation. I'll put up a PR once I figure out how to not break existing tests O.o

powersj commented 10 months ago

@greenfox878, @romanelloacn,

Can both of you please try the artifacts in #14581 and validate that the field names report correctly please?

If they do not please use dump_responses = true to get the lines that are not.

Thanks!

romanelloacn commented 10 months ago

@greenfox878, @romanelloacn,

Can both of you please try the artifacts in #14581 and validate that the field names report correctly please?

If they do not please use dump_responses = true to get the lines that are not.

Thanks!

Sorry but I have no idea how to install/update telegraf from the cloned telegraf repository of your profile

powersj commented 10 months ago

Sorry but I have no idea how to install telegraf from the cloned telegraf repository of your profile

Artifacts, pre-build binaries, will be attached to that PR in 20-30mins. No need to build anything yourself.

romanelloacn commented 10 months ago

Sorry for the late reply... I can confirm that is successfully working now :) I'll wait the next official telegraf update!

image

But now after reinstalling the official telegraf version, Telegraf 1.29.2 (git: HEAD@d92d7073) with "yum install telegraf", the logs give me:

2024-01-22T12:17:43Z E! [inputs.gnmi] Invalid empty path "/interfaces/interface/state/last-change" with alias "openconfig:/interfaces/interface/state" 2024-01-22T12:17:43Z E! [inputs.gnmi] Invalid empty path "/interfaces/interface/state/oper-status" with alias "openconfig:/interfaces/interface/state"

Even if with gnmic the path "/interfaces/interface/state" responds correctly and the same telegraf.conf file was used to test your binaries

image

(the Gi0/0 used only to have a limited output)

If then I execute again your pullrequest binary it works fine so it's not a telegraf.conf problem

powersj commented 10 months ago

Thank you for taking the time to try it out!

romanelloacn commented 10 months ago

Thank you for taking the time to try it out!

You're welcome :) Can you please help with the other issue I described or should I open a new issue in github?

powersj commented 10 months ago

Can you please help with the other issue I described or should I open a new issue in github?

You got the empty path error only with the current release right? The test artifact did not show that right?

romanelloacn commented 10 months ago

Can you please help with the other issue I described or should I open a new issue in github?

You got the empty path error only with the current release right? The test artifact did not show that right?

Correct

powersj commented 10 months ago

ok good :) The current code when parsing the field names does not take the origin (e.g. openconfig into account currently. So when it does the replace it effectively removes everything. My new PR will take that into account and prevent the incorrect substring as well.

romanelloacn commented 10 months ago

Ok thanks! So I'll wait even more patiently the new release :)

powersj commented 10 months ago

@romanelloacn, @greenfox878,

I made some additional changes to the PR. Could I request your try the artifacts one more time to confirm I have not regressed anything?

If all looks good, we can land this and release it in Monday's release.

Thank you!

romanelloacn commented 10 months ago

Hi, everything seems to be working fine!

image

I have one doubt... I see that now in the name of the measurement is included the path of the subfolders of the yang tree. This sometimes can be a problem because in influxdb UI, if you have 3 subfolders, you can't read the actual name of the measurement. A suggestion would be to not replace the "-" in the name with "_" so you can search the actual name in the top bar of influx and select the only result that comes out even if you can't read the full name. Currently, it's not possible because if you search for "in-octets" it doesn't come out and if you search only "octets" both out and in measurements appear but you don't know wich one of the 2 is because there is the full path in front.

In short, I would have preferred the measurement's name to not have the subfolder name and only include the subfolder name when there is a conflict with another field having the same name in the parent folder.

Hope this can help :)

greenfox878 commented 10 months ago

Hi @powersj ! Sorry for delayed reply. I just tested fixed version of telegraf and confim that all looks good! ifcounters,host=test_host,name=Ethernet29,source=test_gnmi_device,test_tag=test in_broadcast_pkts=0i,in_discards=0i,in_errors=0i,in_fcs_errors=0i,in_multicast_pkts=0i,in_octets=0i,in_pkts=0i,in_unicast_pkts=0i,out_broadcast_pkts=0i,out_discards=0i,out_errors=0i,out_multicast_pkts=0i,out_octets=0i,out_pkts=0i,out_unicast_pkts=0i 1704442117721474264

@romanelloacn , Here old topic about "-" to "_" replacement. Looks like it's default behavior.

powersj commented 10 months ago

Thank you both for confirming.