influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.51k stars 5.55k forks source link

Telemetry Dial-Out Agent with support for GBP Compact data #13117

Closed brhoug closed 1 year ago

brhoug commented 1 year ago

Use Case

I would like to have a plugin similar to the cisco_telemtry_mdt plugin that support GPB Compact data and not just Self-Describing Data.

Using OpenConfig Yang models I want to be able to stream the data from the network device to the Telegraf agent. The Dial-Out method is preferred as I do not have credentials for Dial-In, but can ask the owner of the network devices to enable telemetry on the network device and apply the configurations needed.

Expected behavior

The telemetry agent should be able to read in the proto buff files to support GPB Compact data for the OpenConfig Sensor paths that are enabled on a network device.

Actual behavior

Today the cisco_telemtry_mdt plugin for Dial-Out telemetry only supports GPB Self-Describing data.

Additional info

No response

powersj commented 1 year ago

Hi @brhoug,

Thanks for your patience.

@srebhan and I were able to chat about this FR today. In principal we are both excited by the idea of this new plugin! We have a couple questions to help flesh out the idea a bit more:

  1. How do you see Telegraf managing the protobuf file used to decode messages?
  2. How does a user specify the fields and tags based on a message? A common issue with the other router/network plugins is not generating unique data and some messages get overwritten
  3. Can we see an example input, your proposed configuration, and what metrics would get generated based on that input?

Thanks!

brhoug commented 1 year ago

I too have a lot more questions than answers at this time and am new the GNMI/gRPC space.

For questions one and two I do not know the answer and this part of what I am still learning and need to see if it's feasible to do.

In response to question three. I want to gather only metrics from OpenConfig paths. The idea my team has, if a switch vendor supports Dial-Out telemetry using OpenConfig yang models is it possible to have an agent that would work across multiple switch vendors as long as they followed the OpenConfig standard for sending their metrics. In my initial proof of concept, and learnings I used a Cisco Nexus switch and Telegraf plugin Cisco_Telemetry_MDT. This was super simple, and worked as the plugin is designed to work with Self-Describing Data and I did not have to manage proto buff files and do any hacking to make it work. The problems started when I went to try get metrics from other switch vendors and discovered that Self-Describing Data is really only available on Cisco devices.

These are the metrics I was getting in my PoC using the Cisco Nexus switch.

openconfig-acl:acl/acl-sets/acl-set/acl-entries/acl-entry/state/matched-packets openconfig-interfaces:interfaces/interface/state/counters/in-broadcast-pkts openconfig-interfaces:interfaces/interface/state/counters/in-discards openconfig-interfaces:interfaces/interface/state/counters/in-errors openconfig-interfaces:interfaces/interface/state/counters/in-fcs-errors openconfig-interfaces:interfaces/interface/state/counters/in-multicast-pkts openconfig-interfaces:interfaces/interface/state/counters/in-octets openconfig-interfaces:interfaces/interface/state/counters/in-unicast-pkts openconfig-interfaces:interfaces/interface/state/counters/in-unknown-protos openconfig-interfaces:interfaces/interface/state/counters/out-broadcast-pkts openconfig-interfaces:interfaces/interface/state/counters/out-discards openconfig-interfaces:interfaces/interface/state/counters/out-errors openconfig-interfaces:interfaces/interface/state/counters/out-multicast-pkts openconfig-interfaces:interfaces/interface/state/counters/out-octets openconfig-interfaces:interfaces/interface/state/counters/out-unicast-pkts openconfig-interfaces:interfaces/interface/state/oper-status openconfig-network-instance:network-instances/network-instance/interfaces/interface/state openconfig-network-instance:network-instances/network-instance/protocols/protocol/bgp openconfig-platform:components/component/state/memory/available openconfig-platform:components/component/state/memory/utilized openconfig-qos:qos/interfaces/interface/output/queues/queue/state/dropped-pkts openconfig-system:system/cpus/cpu/state/hardware-interrupt/avg openconfig-system:system/cpus/cpu/state/idle/avg openconfig-system:system/cpus/cpu/state/kernel/avg openconfig-system:system/cpus/cpu/state/nice/avg openconfig-system:system/cpus/cpu/state/software-interrupt/avg openconfig-system:system/cpus/cpu/state/total/avg openconfig-system:system/cpus/cpu/state/user/avg openconfig-system:system/cpus/cpu/state/wait/avg openconfig-system:system/memory/state openconfig-system:system/processes/process/state/cpu-utilization openconfig-system:system/processes/process/state/memory-utilization openconfig-interfaces:interfaces/interface/state/enabled openconfig-lldp:lldp/interfaces/interface/neighbors/neighbor/state/port-description openconfig-lldp:lldp/interfaces/interface/neighbors/neighbor/state/port-id openconfig-platform:components/component/fan/state openconfig-platform:components/component/power-supply openconfig-platform:components/component/power-supply/state openconfig-platform:components/component/state/serial-no openconfig-platform:components/component/state/software-version openconfig-platform:components/component/state/temperature/alarm-status openconfig-platform:components/component/state/temperature/instant

Brian Houg Senior Software Engineer, Microsoft @.**@.>

From: Joshua Powers @.> Sent: Tuesday, April 25, 2023 8:34 AM To: influxdata/telegraf @.> Cc: Brian Houg @.>; Mention @.> Subject: Re: [influxdata/telegraf] Telemetry Dial-Out Agent with support for GBP Compact data (Issue #13117)

Hi @brhoughttps://github.com/brhoug,

Thanks for your patience.

@srebhanhttps://github.com/srebhan and I were able to chat about this FR today. In principal we are both excited by the idea of this new plugin! We have a couple questions to help flesh out the idea a bit more:

  1. How do you see Telegraf managing the protobuf file used to decode messages?
  2. How does a user specify the fields and tags based on a message? A common issue with the other router/network plugins is not generating unique data and some messages get overwritten
  3. Can we see an example input, your proposed configuration, and what metrics would get generated based on that input?

Thanks!

- Reply to this email directly, view it on GitHubhttps://github.com/influxdata/telegraf/issues/13117#issuecomment-1522008942, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT6KXYIWN3X2DMQURWE43ALXC7VGTANCNFSM6AAAAAAXF4HQCQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

powersj commented 1 year ago

For questions one and two I do not know the answer and this part of what I am still learning and need to see if it's feasible to do.

You can take a look at the other related telemetry plugins for how it is done today. Namely, we import libraries that contain the protobuf definitions and unmarshal the messages received using those definitions.

Is there a public published protobuf for the compact data format you are interested in? What vendors support the format?

The problems started when I went to try get metrics from other switch vendors and discovered that Self-Describing Data is really only available on Cisco devices.

I think this was our concern as well. If you are going to be a listener you need to know how to parse the messages that come in from potentially a few difference sources and if not toss the message.

brhoug commented 1 year ago

I am still doing my research and talking to so others who have more experience than I do with gnmi/grcp.

Stand by....

powersj commented 1 year ago

no problem, @srebhan was reaching out to others as well to see if we could get some more community feedback and help too.

brhoug commented 1 year ago

After a bunch of dead ends, this doesn't appear to be feasible or worth the effort. The two current plugins appear to be the best available options, Cisco MDT and the GNMI plugin.