influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.65k stars 5.59k forks source link

Missing some information when using OpenTelemetry Output Plugin #13474

Open haihh05 opened 1 year ago

haihh05 commented 1 year ago

Relevant telegraf.conf

#telegraf1.conf
[[inputs.jti_openconfig_telemetry]]
  servers = ["my_network_device:4317"]

  sample_frequency = "10000ms"

  sensors = [
    "2000ms intefaces /interfaces",
  ]

[[outputs.opentelemetry]]
  service_address = "0.0.0.0:30003"

[[outputs.file]]
  files = ["stdout"]

  data_format = "influx"
#telegraf2.conf

[[inputs.opentelemetry]]
  service_address = "0.0.0.0:30003"

[[outputs.file]]
  files = ["stdout"]

  data_format = "influx"

Logs from Telegraf

telegraf1  | intefaces,/interfaces/interface/@name=ae0,device=...,host=...,path=sensor_1004_5_1:/interfaces/:/interfaces/:mib2d,system_id=... _component_id=65535i,/interfaces/interface/state/type="other",/interfaces/interface/state/description="...",/interfaces/interface/state/oper-status="UP",/interfaces/interface/ethernet/state/enable-flow-control=false,/interfaces/interface/state/name="ae0",_timestamp=1687351346281i,/interfaces/interface/state/admin-status="UP",/interfaces/interface/ethernet/state/mac-address="...",/interfaces/interface/ethernet/state/negotiated-port-speed="SPEED_10GB",_subcomponent_id=0i,/interfaces/interface/state/enabled=true,/interfaces/interface/state/last-change=1687025787040761000i,/interfaces/interface/hold-time/state/up=0i,/interfaces/interface/hold-time/state/down=0i,/interfaces/interface/ethernet/state/port-speed="SPEED_10GB",/interfaces/interface/ethernet/state/hw-mac-address="...",_sequence=292i,/interfaces/interface/state/ifindex=517i,/interfaces/interface/state/logical=false,/interfaces/interface/state/mtu=9192i,/interfaces/interface/state/loopback-mode=false 1687322708946903506

telegraf2  | intefaces_/interfaces/interface/state/ifindex,/interfaces/interface/@name=ae0,device=...,host=...,path=sensor_1004_5_1:/interfaces/:/interfaces/:mib2d,system_id=... gauge=517i 168732270894690350

System info

Telegraf 1.26.3

Docker

#docker-compose.yaml
version: '3.3'

services:
  telegraf1:
    image: telegraf:1.26.3
    container_name: telegraf1
    network_mode: host
    restart: unless-stopped
    volumes:
      - ./telegraf1.conf:/etc/telegraf/telegraf.conf

  telegraf2:
    image: telegraf:1.26.3
    container_name: telegraf2
    network_mode: host
    restart: unless-stopped
    volumes:
      - ./telegraf2.conf:/etc/telegraf/telegraf.conf

Steps to reproduce

  1. Run docker-compose file with telegraf1.conf and telegraf2.config : docker compose up -d
  2. Check logs and compare (results as logs above)

Expected behavior

Logs of telegraf2 are the same logs of telegraf1

Actual behavior

Logs of telegraf2 are not the same logs of telegraf1

Additional info

No response

powersj commented 1 year ago

Hi,

There is not enough information in this issue.

Your title implies that the OpenTelemetry output is somehow missing information, except the only output you provided is from the the logs, the file output.

You input is the jti_openconfig_telemetry which clearly states the following:

Client ID must be unique when connecting from multiple instances
of telegraf to the same device

You are trying to use the exact same config and hence have the same client ID in both and as a result you are not collecting unique data between both instances.

Please fix your config to have unique client IDs. Closing as not planned.

haihh05 commented 1 year ago

Hi @powersj

Hi,

There is not enough information in this issue.

Your title implies that the OpenTelemetry output is somehow missing information, except the only output you provided is from the the logs, the file output.

You input is the jti_openconfig_telemetry which clearly states the following:

Client ID must be unique when connecting from multiple instances
of telegraf to the same device

You are trying to use the exact same config and hence have the same client ID in both and as a result you are not collecting unique data between both instances.

Please fix your config to have unique client IDs. Closing as not planned.

To make it clear, my diagram is as below:

diagram

There is one input flow to telegraf1, so it can not have the same client ID.

Why not log1 and log2 are the same?

Even if I add it to my config, it still misses the information

  username = "my_user"
  password = "my_pass"
  client_id = "telegraf"
powersj commented 1 year ago

There is one input flow to telegraf1, so it can not have the same client ID.

That was not clear from your initial report. Thanks - your edits have made it clearer. Let me ask about this one internally.

jacobmarble commented 1 year ago

@haihh05 it would help me reproduce the error if the example inputs/outputs were smaller.

From the provided line protocol, I can see that the measurement name isn't even the same: intefaces vs intefaces_/interfaces/interface/state/ifindex

This is suspicious.

jacobmarble commented 1 year ago

@haihh05 your reproduce steps do not yield any logs because nothing sends input to this plugin: [[inputs.jti_openconfig_telemetry]]

jacobmarble commented 1 year ago

Also, I would like to emphasize that, while I'm interested to learn and share the cause of this behavior, complete round-trip fidelity is not a goal of the related modules.

haihh05 commented 1 year ago

So, what can I do next? I can provide more details on what you need.

haihh05 commented 1 year ago

From the provided line protocol, I can see that the measurement name isn't even the same: intefaces vs intefaces_/interfaces/interface/state/ifindex

This is suspicious.

You can find this key/value in log1: "/interfaces/interface/state/ifindex=517i", and log2 is "intefaces_/interfaces/interface/state/ifindex" with "gauge=517i"

I can confirm two logs is from the same metric because I can not see other logs about it having more details on interface ae0 than logs of telegraf1 in the logs of telegraf2.

jacobmarble commented 1 year ago

So, what can I do next? I can provide more details on what you need.

@haihh05 your reproduce steps do not yield any logs because nothing sends input to this plugin: [[inputs.jti_openconfig_telemetry]]

I need input to this plugin. Without it, I cannot generate log1 and log2 for myself.

haihh05 commented 1 year ago

I need input to this plugin. Without it, I cannot generate log1 and log2 for myself.

The input is from my network device: Model: qfx5120-48y-8c Junos: 20.4R3.8 My config command is:

set system services extension-service request-response grpc clear-text port 4317
set system services extension-service request-response grpc skip-authentication
set system services extension-service notification allow-clients address <telegraf-server>/32

You can see details of metrics at this

powersj commented 1 year ago

Model: qfx5120-48y-8c

I do not necessarily expect @jacobmarble to go out and find a $15k switch ;)

I wanted to try this out myself and converted your line protocol above into JSON so I could read it from a file. Here is the JSON version of your metric above:

{
    "name": "intefaces",
    "timestamp": 1687322708946903600,
    "tags": {
      "/interfaces/interface/@name": "ae0",
      "device": "...",
      "host": "...",
      "path": "sensor_1004_5_1:/interfaces/:/interfaces/:mib2d",
      "system_id": "..."
    },
    "fields": {
      "_component_id": 65535,
      "/interfaces/interface/state/type": "other",
      "/interfaces/interface/state/description": "...",
      "/interfaces/interface/state/oper-status": "UP",
      "/interfaces/interface/ethernet/state/enable-flow-control": "false",
      "/interfaces/interface/state/name": "ae0",
      "_timestamp": 1687351346281,
      "/interfaces/interface/state/admin-status": "UP",
      "/interfaces/interface/ethernet/state/mac-address": "...",
      "/interfaces/interface/ethernet/state/negotiated-port-speed": "SPEED_10GB",
      "_subcomponent_id": 0,
      "/interfaces/interface/state/enabled": "true",
      "/interfaces/interface/state/last-change": 1687025787040761000,
      "/interfaces/interface/hold-time/state/up": 0,
      "/interfaces/interface/hold-time/state/down": 0,
      "/interfaces/interface/ethernet/state/port-speed": "SPEED_10GB",
      "/interfaces/interface/ethernet/state/hw-mac-address": "...",
      "_sequence": 292,
      "/interfaces/interface/state/ifindex": 517,
      "/interfaces/interface/state/logical": "false",
      "/interfaces/interface/state/mtu": 9192,
      "/interfaces/interface/state/loopback-mode": "false"
    }
}

Then I used the following config to read the metric and send it using the file and opentelemetry outputs:

[agent]
    debug = true

[[inputs.file]]
    files = ["data.json"]
    data_format = "xpath_json"

    [[inputs.file.xpath]]
        metric_name = "/name"
        timestamp = "/timestamp"
        timestamp_format = "unix"
        field_selection = "fields/*"
        tag_selection = "tags/*"

[[outputs.file]]

[[outputs.opentelemetry]]
    service_address = "0.0.0.0:30003"

However, I get many of the following messages:

2023-06-26T13:47:26Z D! [outputs.opentelemetry] field has unsupported type measurement="intefaces" field="/interfaces/interface/ethernet/state/enable-flow-control" type="string"
2023-06-26T13:47:26Z D! [outputs.opentelemetry] field has unsupported type measurement="intefaces" field="/interfaces/interface/state/admin-status" type="string"

@jacobmarble - this would seem to explain why the receiver only reported the numeric metric and all the string fields were ignored. Does that seem like the right take away?

I did also try metrics with a few numeric fields and they worked as expected:

foobar,host=ryzen,source=localhost value=42,other_value=44 1687786520000000000

Was sent and correctly reported as:

foobar_value,host=ryzen,source=localhost gauge=42 1687786510000000000
foobar_other_value,host=ryzen,source=localhost gauge=44 1687786510000000000
NielsMikuta commented 10 months ago

Hi @powersj I am struggling with getting strings from snmp_trap into opentelemetry as well. Did you ever find a solution?