influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.69k stars 5.59k forks source link

cisco_telemetry_mdt plugin and decimal64 yang fields are returned as string #6977

Closed lchabert closed 4 years ago

lchabert commented 4 years ago

Hi telegraf folks,

In order to make some charts with cisco NX-OS telemetry system, i want to collect logs with telegraf. Unfortunately, decimal64 yang type is returned as a string, and stored as a string into influxdb database (so queries can't be executed)... Please find configuration and technical data below.

Relevant telegraf.conf:

[[inputs.cisco_telemetry_mdt]]
 transport = "grpc"
 service_address = ":50003"
 embedded_tags = ["Cisco-NX-OS-device:System/procsys-items/sysload-items/name"]

Relevant nexus 9k config

telemetry
  destination-profile
    use-vrf management
  destination-group 100
    ip address 10.204.5.42 port 50003 protocol gRPC encoding GPB
  sensor-group 100
    data-source YANG
    path Cisco-NX-OS-device:System/procsys-items/sysload-items
  subscription 100
    dst-grp 100
    snsr-grp 100 sample-interval 1000

System info:

Telegraf version: Telegraf 1.13.1 (git: HEAD 0c175724) OS: debian buster/sid

Output log

I have exported the data to a json log file:

{
  "fields": {
    "System/procsys_items/sysload_items/loadAverage15m": "0.15",
    "System/procsys_items/sysload_items/loadAverage1m": "0.26",
    "System/procsys_items/sysload_items/loadAverage5m": "0.18",
    "System/procsys_items/sysload_items/loadAverage5sec": "5.0",
    "System/procsys_items/sysload_items/name": "sysload",
    "System/procsys_items/sysload_items/runProc": 1,
    "System/procsys_items/sysload_items/totalProc": 160
  },
  "name": "Cisco-NX-OS-device:System/procsys-items/sysload-items",
  "tags": {
    "Cisco_NX_OS_device:System/procsys_items/sysload_items": "Cisco-NX-OS-device:System/procsys-items/sysload-items",
    "host": "telemetrycollector",
    "path": "Cisco-NX-OS-device:System/procsys-items/sysload-items",
    "source": "NXOS01",
    "subscription": "100"
  },
  "timestamp": 1580808885
}

And execute a infuxdb query

SELECT mean("System/procsys_items/sysload_items/loadAverage15m") FROM "Cisco-NX-OS-device:System/procsys-items/sysload-items" WHERE ("source" = 'NXOS01') AND time >= now() - 15m GROUP BY time(1s) fill(null)
ERR: unsupported mean iterator type: *query.stringInterruptIterator

As we can see, the load average is stored as a string, so no queries can be executed on it.

I have opened the NX-OS device yang file, and found the following types for the sysload_items container:

     |  +--rw sysload-items
     |  |  +--ro loadAverage1m?     decimal64
     |  |  +--ro loadAverage5m?     decimal64
     |  |  +--ro loadAverage15m?    decimal64
     |  |  +--ro loadAverage5sec?   decimal64
     |  |  +--ro totalProc?         uint32
     |  |  +--ro runProc?           uint32
     |  |  +--ro name?              naming_Name256

Do you have any hints to make it works ? Why the decimal64 fields is not casted into float fields, in order to be stored/queried by any TSDB ?

Thanks in advance,

danielnelson commented 4 years ago

@sbyx Can you take a look into this one?

sbyx commented 4 years ago

The mdt input plugin does not do any type conversion for non-key fields but passes fields along in the types they arrive in from protobuf. So my first assumption would be that the device sends the data as string already.

lchabert commented 4 years ago

When i send data over HTTP/JSON transport, and start a netcat to print received data, decimal data is sent as a string by the switch .

On RFC, decimal64 fields is in reality two fields: The Decimal64 message contains an integer value of the digits and an unsigned integer precision indicating the number of digits following the decimal point. The following lib explains protobufs messages: https://github.com/openconfig/ygot/blob/master/docs/yang-to-protobuf-transformations-spec.md

How can i display the "type" sent by the switch using protobuf ? If it's send as a string, i will open a case on cisco TAC, but i need to be sure before open it.

Thanks,

sbyx commented 4 years ago

I tried the same on a NX-OS in my lab, and it confirmed my suspicion. The device is actually filling out the "stringValue"-field in the GRPC message to the collector. Refer to https://github.com/CiscoDevNet/nx-telemetry-proto/blob/master/telemetry_bis.proto#L64

{
 "nodeIdStr": "switch-rack8",
 "subscriptionIdStr": "1",
 "encodingPath": "Cisco-NX-OS-device:System/procsys-items/sysload-items",
 "collectionId": "37950",
 "msgTimestamp": "1580897921133",
 "dataGpbkv": [
  {
   "fields": [
    {
     "name": "keys",
     "fields": [
      {
       "name": "Cisco-NX-OS-device:System/procsys-items/sysload-items",
       "stringValue": "Cisco-NX-OS-device:System/procsys-items/sysload-items"
      }
     ]
    },
    {
     "name": "content",
     "fields": [
      {
       "fields": [
        {
         "name": "System",
         "fields": [
          {
           "fields": [
            {
             "name": "xmlns",
             "stringValue": "http://cisco.com/ns/yang/cisco-nx-os-device"
            },
            {
             "name": "procsys-items",
             "fields": [
              {
               "fields": [
                {
                 "name": "sysload-items",
                 "fields": [
                  {
                   "fields": [
                    {
                     "name": "loadAverage15m",
                     "stringValue": "0.600000"
                    },
                    {
                     "name": "loadAverage1m",
                     "stringValue": "0.750000"
                    },
                    {
                     "name": "loadAverage5m",
                     "stringValue": "0.610000"
                    },
                    {
                     "name": "loadAverage5sec",
                     "stringValue": "13.000000"
                    },
                    {
                     "name": "name",
                     "stringValue": "sysload"
                    },
                    {
                     "name": "runProc",
                     "uint64Value": "1"
                    },
                    {
                     "name": "totalProc",
                     "uint64Value": "303"
                    }
                   ]
                  }
                 ]
                }
               ]
              }
             ]
            }
           ]
          }
         ]
        }
       ]
      }
     ]
    }
   ]
  }
 ]
}
lchabert commented 4 years ago

So from your point of view, it's a cisco bug ? If Cisco send metrics data as string, any TSDB could not store telemetry values.

Do you have any method to make it work with telegraf ?

sbyx commented 4 years ago

Well my point is at least the data comes in string format directly from the device. It is nothing in Telegraf doing that. You could give a try with the converter processor to cast the data fields to a different type before outputting: https://github.com/influxdata/telegraf/tree/master/plugins/processors/converter