influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.67k stars 5.59k forks source link

inputs.gnmi: Authentication Broken starting in 1.29.2 #15236

Closed whizkidTRW closed 6 months ago

whizkidTRW commented 6 months ago

Relevant telegraf.conf

# Ciena gNMI
[agent]
  interval = "5m"

[[inputs.gnmi]]
  alias = "ciena-gnmi"
  addresses = [ "10.144.248.21:6702" ]

  username = "XXXXXXXXXXXXXXXXX"
  password = "XXXXXXXXXXXXXXXXX"
  encoding = "proto"
  redial = "10s"
  tls_enable = true
  insecure_skip_verify = true
  tls_ca = "/etc/telegraf/ciena-ca.cert.pem"
  tls_cert = "/etc/telegraf/ciena-client.cert.pem"
  tls_key = "/etc/telegraf/ciena-client.key.pem"
  name_override = "saos10xgnmi"
  updates_only = true

  fieldpass = [ "name","source","ifIndex","in_crc_error_pkts","in_discards","in_errors","in_octets","out_errors","out_octets" ]
  tagexclude = ["path"]

  [[inputs.gnmi.subscription]]
     name = "ifcounters"
     origin = "Ciena"
     path = "/oc-if:interfaces/oc-if:interface[name=9]/oc-if:state/oc-if:counters"
     subscription_mode = "sample"
     sample_interval = "30s"

[[processors.converter]]
    order = 1
    namepass = ["saos10xgnmi"]

    [processors.converter.fields]
        tag = ["name"]

[[processors.strings]]
    order = 2
    namepass = ["saos10xgnmi"]

    [[processors.strings.replace]]
        tag = "name"
        old = "\""
        new = ""

[[processors.rename]]
    order = 3
    namepass = ["saos10xgnmi"]

    [[processors.rename.replace]]
        field = "in_octets"
        dest = "ifHCInOctets"

    [[processors.rename.replace]]
        field = "out_octets"
        dest = "ifHCOutOctets"

    [[processors.rename.replace]]
        field = "in_errors"
        dest = "ifInErrors"

    [[processors.rename.replace]]
        field = "out_errors"
        dest = "ifOutErrors"

    [[processors.rename.replace]]
        field = "in_discards"
        dest = "ifInDiscards"

    [[processors.rename.replace]]
        field = "in_crc_error_pkts"
        dest = "ifInCrcErrors"

    [[processors.rename.replace]]
        tag = "name"
        dest = "ifIndex"

    [[processors.rename.replace]]
        tag = "source"
        dest = "agent_host"

    [[processors.rename.replace]]
        measurement = "saos10xgnmi"
        dest = "interface"

Logs from Telegraf

v1.29.1:
telegraf  | 2024-04-25T22:03:46Z I! Loading config: /etc/telegraf/telegraf.conf
telegraf  | 2024-04-25T22:03:46Z I! Loading config: /etc/telegraf/telegraf.d/ciena.conf
telegraf  | 2024-04-25T22:03:46Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
telegraf  | 2024-04-25T22:03:46Z I! Starting Telegraf 1.29.1 brought to you by InfluxData the makers of InfluxDB
telegraf  | 2024-04-25T22:03:46Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 6 secret-stores
telegraf  | 2024-04-25T22:03:46Z I! Loaded inputs: gnmi
telegraf  | 2024-04-25T22:03:46Z I! Loaded aggregators: 
telegraf  | 2024-04-25T22:03:46Z I! Loaded processors: converter rename strings
telegraf  | 2024-04-25T22:03:46Z I! Loaded secretstores: 
telegraf  | 2024-04-25T22:03:46Z W! Outputs are not used in testing mode!
telegraf  | 2024-04-25T22:03:46Z I! Tags enabled: host=10.5.200.224
telegraf  | 2024-04-25T22:03:46Z D! [agent] Initializing plugins
telegraf  | 2024-04-25T22:03:46Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
telegraf  | 2024-04-25T22:03:46Z D! [agent] Starting service inputs

telegraf  | 2024-04-25T22:03:47Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
telegraf  | > interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714082658121000000

telegraf  | 2024-04-25T22:04:26Z D! [agent] Stopping service inputs
telegraf  | 2024-04-25T22:04:26Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
telegraf  | 2024-04-25T22:04:26Z D! [agent] Input channel closed
telegraf  | 2024-04-25T22:04:26Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:04:26Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:04:26Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:04:26Z D! [agent] Stopped Successfully
telegraf exited with code 0

v1.29.2:
telegraf  | 2024-04-25T22:04:35Z I! Loading config: /etc/telegraf/telegraf.conf
telegraf  | 2024-04-25T22:04:35Z I! Loading config: /etc/telegraf/telegraf.d/ciena.conf
telegraf  | 2024-04-25T22:04:35Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
telegraf  | 2024-04-25T22:04:35Z I! Starting Telegraf 1.29.2 brought to you by InfluxData the makers of InfluxDB
telegraf  | 2024-04-25T22:04:35Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 6 secret-stores
telegraf  | 2024-04-25T22:04:35Z I! Loaded inputs: gnmi
telegraf  | 2024-04-25T22:04:35Z I! Loaded aggregators: 
telegraf  | 2024-04-25T22:04:35Z I! Loaded processors: converter rename strings
telegraf  | 2024-04-25T22:04:35Z I! Loaded secretstores: 
telegraf  | 2024-04-25T22:04:35Z W! Outputs are not used in testing mode!
telegraf  | 2024-04-25T22:04:35Z I! Tags enabled: host=10.5.200.224
telegraf  | 2024-04-25T22:04:35Z D! [agent] Initializing plugins
telegraf  | 2024-04-25T22:04:35Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
telegraf  | 2024-04-25T22:04:35Z D! [agent] Starting service inputs

telegraf  | 2024-04-25T22:04:35Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
telegraf  | 2024-04-25T22:04:46Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
telegraf  | 2024-04-25T22:04:56Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
telegraf  | 2024-04-25T22:05:07Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"

telegraf  | 2024-04-25T22:05:15Z D! [agent] Stopping service inputs
telegraf  | 2024-04-25T22:05:15Z D! [agent] Input channel closed
telegraf  | 2024-04-25T22:05:15Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:05:15Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:05:15Z D! [agent] Processor channel closed
telegraf  | 2024-04-25T22:05:15Z D! [agent] Stopped Successfully
telegraf  | 2024-04-25T22:05:15Z E! [telegraf] Error running agent: input plugins recorded 4 errors
telegraf exited with code 1

System info

Telegraf 1.29.2, MacOs 14.4.1, Docker 26.0.0

Docker

services:
  telegraf:
    image: telegraf:1.29.2
    container_name: telegraf
    restart: no
    command: telegraf --debug --test-wait 40
    volumes:
      - /etc/snmp:/etc/snmp:ro
      - ./mibs:/usr/share/snmp/mibs:rw
      - ./telegraf/etc:/etc/telegraf:rw
    ports:
      - '8125:8125'
    logging:
      options:
        max-size: "1m"
        max-file: "5"

Steps to reproduce

  1. Use telegraf:1.29.2 (or higher) in docker-compose.yml
  2. insecure_skip_verify = true
  3. tls_enable = true

Expected behavior

Proper subscription is authenticated and able to subscribe to data

Actual behavior

Authentication handshake fails

Additional info

Exact same config was working in 1.29.1 and prior. I was attempting to upgrade my system from 1.28.2 to the latest, 1.30.2, when I experienced this behavior. Backing down in versions, I identified it works in 1.29.1 and fails starting in 1.29.2. No configuration was changed between 1.29.1 working and >1.29.2 failing.

whizkidTRW commented 6 months ago

FYI, this is NOT my production docker-compose.yml file. This is only for debug / release testing. I use the --debug and --test-wait 40 options to validate releases. telegraf is the only command in the production docker-compose.yml file.

powersj commented 6 months ago

Hi,

Here is the diff between those versions. There were no changes made to GNMI code, only a single spelling update to a comment.

tls: handshake failure

This can be due to a number of things, even outside of Telegraf. For example, if the date and time are not correctly set in your container. It can also be due to the TLS protocol mismatch between the client and server.

In terms of the 44 commits in that version of Telegraf, there are 6 commits that might stand out to me, 1 dep and 5 linters. The 5 linter PRs, none changed anything in GNMI or to anything it imports or uses. That leaves one dependency:

Please try to reproduce this outside of your container first. We want to eliminate anything that changes to the container or docker environment as a possibility. If you are still able to reproduce, then we may can start bisecting between the versions.

Thanks!

srebhan commented 6 months ago

The other potential place is the golang.org/x/crypto update to v0.17.0 adding a strict KEX mode... Not sure if this is relevant here...

@whizkidTRW it would also be useful to know which TLS version the device is speaking...

whizkidTRW commented 6 months ago

@powersj, the containers were recreated within minutes of each other and I just confirmed the date/time is correct. I did download the 1.29.1, 1.29.2, & 1.30.2 binaries and the behavior is the same.

@srebhan, I confirmed the Ciena box is running TLS v1.2.

I also went on to test against our Cisco IOS-XR boxes, and those are fine, so this is limited to the Ciena's, which doesn't really surprise me . . . Bear in mind, we have little control over the certificates these boxes are using due to the NMS putting them on the box for its own management purposes and they are self-signed, hence insecure_skip_verify = true option.

BAIL-CO-RT-CN-01-01> show tls
+----------- TLS SERVICE PROFILES ----------+
| Name                   | Value            |
+------------------------+------------------+
| Service Profile Name   | mcp              |
| TLS Profile Name       | mcp-profile      |
| Peer Auth Profile Name | mcp-auth-profile |
| Certificate Name       | mcp-server       |
+------------------------+------------------+

+--------- PEER AUTH PROFILES ---------+
| Name              | Value            |
+-------------------+------------------+
| Profile Name      | mcp-auth-profile |
| Check Expiry      | True             |
| Check IP/Host     | False            |
| Check Fingerprint | False            |
| Fingerprint List  | -                |
+-------------------+------------------+

+------------------------- HELLO PARAMS ------------------------+
| Name                         | Value                          |
+------------------------------+--------------------------------+
| Profile Name                 | mcp-profile                    |
| Protocol Versions            | tls-1.2                        |
| Cipher Suites                | ecdhe-rsa-with-aes-128-cbc-sha |
| Elliptic Curves              | secp256r1                      |
| Sess. Resumption Timeout (s) | 3600                           |
| OCSP State                   | disabled                       |
| NONCE State                  | enabled                        |
| Default OCSP Responder URL   | -                              |
+------------------------------+--------------------------------+

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.29.1/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 40 --config-directory ./telegraf.d
2024-04-26T15:13:06Z I! Loading config: ./telegraf.conf
2024-04-26T15:13:06Z I! Loading config: telegraf.d/ciena.conf
2024-04-26T15:13:06Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-26T15:13:06Z I! Starting Telegraf 1.29.1 brought to you by InfluxData the makers of InfluxDB
2024-04-26T15:13:06Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-26T15:13:06Z I! Loaded inputs: gnmi
2024-04-26T15:13:06Z I! Loaded aggregators: 
2024-04-26T15:13:06Z I! Loaded processors: converter rename strings
2024-04-26T15:13:06Z I! Loaded secretstores: 
2024-04-26T15:13:06Z W! Outputs are not used in testing mode!
2024-04-26T15:13:06Z I! Tags enabled: host=10.5.200.224
2024-04-26T15:13:06Z D! [agent] Initializing plugins
2024-04-26T15:13:06Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-26T15:13:06Z D! [agent] Starting service inputs

2024-04-26T15:13:06Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714144413009000000

2024-04-26T15:13:46Z D! [agent] Stopping service inputs
2024-04-26T15:13:46Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
2024-04-26T15:13:46Z D! [agent] Input channel closed
2024-04-26T15:13:46Z D! [agent] Processor channel closed
2024-04-26T15:13:46Z D! [agent] Processor channel closed
2024-04-26T15:13:46Z D! [agent] Processor channel closed
2024-04-26T15:13:46Z D! [agent] Stopped Successfully

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.29.2/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 40 --config-directory ./telegraf.d
2024-04-26T15:14:09Z I! Loading config: ./telegraf.conf
2024-04-26T15:14:09Z I! Loading config: telegraf.d/ciena.conf
2024-04-26T15:14:09Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-26T15:14:09Z I! Starting Telegraf 1.29.2 brought to you by InfluxData the makers of InfluxDB
2024-04-26T15:14:09Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-26T15:14:09Z I! Loaded inputs: gnmi
2024-04-26T15:14:09Z I! Loaded aggregators: 
2024-04-26T15:14:09Z I! Loaded processors: converter rename strings
2024-04-26T15:14:09Z I! Loaded secretstores: 
2024-04-26T15:14:09Z W! Outputs are not used in testing mode!
2024-04-26T15:14:09Z I! Tags enabled: host=10.5.200.224
2024-04-26T15:14:09Z D! [agent] Initializing plugins
2024-04-26T15:14:09Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-26T15:14:09Z D! [agent] Starting service inputs

2024-04-26T15:14:09Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-26T15:14:19Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-26T15:14:29Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-26T15:14:39Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"

2024-04-26T15:14:49Z D! [agent] Stopping service inputs
2024-04-26T15:14:49Z D! [agent] Input channel closed
2024-04-26T15:14:49Z D! [agent] Processor channel closed
2024-04-26T15:14:49Z D! [agent] Processor channel closed
2024-04-26T15:14:49Z D! [agent] Processor channel closed
2024-04-26T15:14:49Z D! [agent] Stopped Successfully
2024-04-26T15:14:49Z E! [telegraf] Error running agent: input plugins recorded 4 errors
powersj commented 6 months ago

If I have counted correctly these are the first few commits we will want to try out:

If that works, then we need to go later

If that fails, then we need to earlier

I have put up #15240 which includes everything up to the 22nd commit. Could you please try that and let us know the result. Artifacts will be attached in ~25mins.

If you are comfortable with git and building Telegraf you could do the gitbisect yourself, build the version and keep trying a bit faster, but this shouldn't be too bad. The 22nd commit omits the protobuf update. ~If that still is broken the next would omit the crypot update, which are the only two things that seem to stand out.~

the containers were recreated within minutes of each other and I just confirmed the date/time is correct.

Thank you for taking the time to confirm. fwiw when the containers are created may have no bearing on if the time is set correctly. Keep in mind that we make changes to the underlying containers between versions, and DockerHub, because they are official images are also making updates to the images for underlying security updates to the image. So actually verifying is still important.

Thanks again!

whizkidTRW commented 6 months ago

Yes sir, I'm happy to test. Understood on the containers, will keep running it directly outside of docker for now as this is all just local to my machine right now, so easily managed. I've not ever done gitbisect, but I can follow any instructions if you want to provide them. I'll keep an eye out for the artifacts on your PR and download as soon as they're available. Thanks for jumping on this!

powersj commented 6 months ago

Looks like the first artifacts are up:

https://github.com/influxdata/telegraf/pull/15240#issuecomment-2079680339

Those are from the 22nd commit. I have some other branches building now as well.

whizkidTRW commented 6 months ago

Done. That worked perfectly fine:

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.29.1/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 40 --config-directory ./telegraf.d
2024-04-26T19:58:57Z I! Loading config: ./telegraf.conf
2024-04-26T19:58:57Z I! Loading config: telegraf.d/ciena.conf
2024-04-26T19:58:57Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-26T19:58:57Z I! Starting Telegraf 1.29.1-97b9bc24 brought to you by InfluxData the makers of InfluxDB
2024-04-26T19:58:57Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-26T19:58:57Z I! Loaded inputs: gnmi
2024-04-26T19:58:57Z I! Loaded aggregators: 
2024-04-26T19:58:57Z I! Loaded processors: converter rename strings
2024-04-26T19:58:57Z I! Loaded secretstores: 
2024-04-26T19:58:57Z W! Outputs are not used in testing mode!
2024-04-26T19:58:57Z I! Tags enabled: host=10.5.200.224
2024-04-26T19:58:57Z D! [agent] Initializing plugins
2024-04-26T19:58:57Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-26T19:58:57Z D! [agent] Starting service inputs

2024-04-26T19:58:57Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714161543011000000
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714161573010000000

2024-04-26T19:59:37Z D! [agent] Stopping service inputs
2024-04-26T19:59:37Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
2024-04-26T19:59:37Z D! [agent] Input channel closed
2024-04-26T19:59:37Z D! [agent] Processor channel closed
2024-04-26T19:59:37Z D! [agent] Processor channel closed
2024-04-26T19:59:37Z D! [agent] Processor channel closed
2024-04-26T19:59:37Z D! [agent] Stopped Successfully
powersj commented 6 months ago

That worked perfectly fine:

Not what I expected at all :scream: I was already preparing some other PRs that dealt with the earlier dependency updates, but I'll go close those now...

That result narrows it down to the 22 commits after that. 3 of those commits are related to the release, like build #, change log, so really 19. Of the 19, there are dependabot 5 dependency updates and then I went through the remaining 14 and found https://github.com/influxdata/telegraf/commit/01f12c2d42c8ac82b33dfb6a3e2419dcfaf5d896 also updates the version of gRPC! Next steps:

First, let's have you try https://github.com/influxdata/telegraf/pull/15246, which is from the 33rd commit. It is right before the gRPC update. If that works, then I think the next commit probably breaks you as all the other commits after this are either to individual plugins or unrelated doc updates.

If that fails, then we will need to start going through those final 5 dependabot updates.

Expect a new artifact in 30mins, assuming tests pass.

whizkidTRW commented 6 months ago

Sounds good! I'm heading out of town for the weekend but can likely do some tests remotely late tonight. I'll post back an update as soon as I can.

powersj commented 6 months ago

Feel free to wait to Monday and enjoy your weekend! I'm calling it shortly as well :)

whizkidTRW commented 6 months ago

Had a few free minutes and grabbed the artifacts, still good at this point:


2024-04-26T21:46:22Z I! Loading config: ./telegraf.conf
2024-04-26T21:46:22Z I! Loading config: telegraf.d/ciena.conf
2024-04-26T21:46:22Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-26T21:46:22Z I! Starting Telegraf 1.29.1-96995a94 brought to you by InfluxData the makers of InfluxDB
2024-04-26T21:46:22Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-26T21:46:22Z I! Loaded inputs: gnmi
2024-04-26T21:46:22Z I! Loaded aggregators: 
2024-04-26T21:46:22Z I! Loaded processors: converter rename strings
2024-04-26T21:46:22Z I! Loaded secretstores: 
2024-04-26T21:46:22Z W! Outputs are not used in testing mode!
2024-04-26T21:46:22Z I! Tags enabled: host=10.5.200.224
2024-04-26T21:46:22Z D! [agent] Initializing plugins
2024-04-26T21:46:22Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-26T21:46:22Z D! [agent] Starting service inputs

2024-04-26T21:46:23Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714167993007000000

2024-04-26T21:47:02Z D! [agent] Stopping service inputs
2024-04-26T21:47:02Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
2024-04-26T21:47:02Z D! [agent] Input channel closed
2024-04-26T21:47:02Z D! [agent] Processor channel closed
2024-04-26T21:47:02Z D! [agent] Processor channel closed
2024-04-26T21:47:02Z D! [agent] Processor channel closed
2024-04-26T21:47:02Z D! [agent] Stopped Successfully
powersj commented 6 months ago

ok! That probably means the grpc library was the cause. For Monday, here is another PR that adds that commit:

https://github.com/influxdata/telegraf/pull/15247

If that fails (and I sort of hope it does) then we need to look into what changed with the grpc library, could be and probably is an upstream issue.

If that works, then I'll be very, very confused as to what is left to check.

Thanks!

whizkidTRW commented 6 months ago

Yep, that's it! Fails with that version. Went back one step to the previous version immediately after with the exact same config file just to verify and it was still good at that point, so yes, it must be the grpc library:

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.29.1-01f12c2d/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 40 --config-directory ./telegraf.d
2024-04-27T03:48:49Z I! Loading config: ./telegraf.conf
2024-04-27T03:48:49Z I! Loading config: telegraf.d/ciena.conf
2024-04-27T03:48:49Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-27T03:48:49Z I! Starting Telegraf 1.29.1-01f12c2d brought to you by InfluxData the makers of InfluxDB
2024-04-27T03:48:49Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-27T03:48:49Z I! Loaded inputs: gnmi
2024-04-27T03:48:49Z I! Loaded aggregators: 
2024-04-27T03:48:49Z I! Loaded processors: converter rename strings
2024-04-27T03:48:49Z I! Loaded secretstores: 
2024-04-27T03:48:49Z W! Outputs are not used in testing mode!
2024-04-27T03:48:49Z I! Tags enabled: host=10.5.200.224
2024-04-27T03:48:49Z D! [agent] Initializing plugins
2024-04-27T03:48:49Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-27T03:48:49Z D! [agent] Starting service inputs

2024-04-27T03:48:49Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-27T03:48:59Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-27T03:49:09Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"
2024-04-27T03:49:20Z E! [inputs.gnmi::ciena-gnmi] Error in plugin: failed to setup subscription: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: remote error: tls: handshake failure"

2024-04-27T03:49:29Z D! [agent] Stopping service inputs
2024-04-27T03:49:29Z D! [agent] Input channel closed
2024-04-27T03:49:29Z D! [agent] Processor channel closed
2024-04-27T03:49:29Z D! [agent] Processor channel closed
2024-04-27T03:49:29Z D! [agent] Processor channel closed
2024-04-27T03:49:29Z D! [agent] Stopped Successfully
2024-04-27T03:49:29Z E! [telegraf] Error running agent: input plugins recorded 4 errors

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.29.1-96995a94/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 40 --config-directory ./telegraf.d
2024-04-27T03:49:34Z I! Loading config: ./telegraf.conf
2024-04-27T03:49:34Z I! Loading config: telegraf.d/ciena.conf
2024-04-27T03:49:34Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-27T03:49:34Z I! Starting Telegraf 1.29.1-96995a94 brought to you by InfluxData the makers of InfluxDB
2024-04-27T03:49:34Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 5 secret-stores
2024-04-27T03:49:34Z I! Loaded inputs: gnmi
2024-04-27T03:49:34Z I! Loaded aggregators: 
2024-04-27T03:49:34Z I! Loaded processors: converter rename strings
2024-04-27T03:49:34Z I! Loaded secretstores: 
2024-04-27T03:49:34Z W! Outputs are not used in testing mode!
2024-04-27T03:49:34Z I! Tags enabled: host=10.5.200.224
2024-04-27T03:49:34Z D! [agent] Initializing plugins
2024-04-27T03:49:34Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-27T03:49:34Z D! [agent] Starting service inputs

2024-04-27T03:49:34Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714189803011000000

2024-04-27T03:50:14Z D! [agent] Stopping service inputs
2024-04-27T03:50:14Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
2024-04-27T03:50:14Z D! [agent] Input channel closed
2024-04-27T03:50:14Z D! [agent] Processor channel closed
2024-04-27T03:50:14Z D! [agent] Processor channel closed
2024-04-27T03:50:14Z D! [agent] Processor channel closed
2024-04-27T03:50:14Z D! [agent] Stopped Successfully
srebhan commented 6 months ago

@powersj and @whizkidTRW: The GRPC library disables insecure ciphers by default starting from v1.60.0. The device reports

| Cipher Suites | ecdhe-rsa-with-aes-128-cbc-sha |

see here which is insecure as per crypto/tls's definition.

In PR #15256 I allow to pass the accepted ciphers via tls_cipher_suites so @whizkidTRW please try the binary in #15256 and set

  tls_cipher_suites = ["TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA"]

Please let me know if this fixes the issue!

whizkidTRW commented 6 months ago

@srebhan, that worked perfectly!!! And I did confirm it works for both Ciena that was broken and continues to work for Cisco.

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.31.0-58f7dadc/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 300 --config-directory ./telegraf.d
2024-04-29T15:50:02Z I! Loading config: ./telegraf.conf
2024-04-29T15:50:02Z I! Loading config: telegraf.d/ciena.conf
2024-04-29T15:50:02Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-29T15:50:02Z I! Starting Telegraf 1.31.0-58f7dadc brought to you by InfluxData the makers of InfluxDB
2024-04-29T15:50:02Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 5 secret-stores
2024-04-29T15:50:02Z I! Loaded inputs: gnmi
2024-04-29T15:50:02Z I! Loaded aggregators: 
2024-04-29T15:50:02Z I! Loaded processors: converter rename strings
2024-04-29T15:50:02Z I! Loaded secretstores: 
2024-04-29T15:50:02Z W! Outputs are not used in testing mode!
2024-04-29T15:50:02Z I! Tags enabled: host=10.5.200.224
2024-04-29T15:50:02Z D! [agent] Initializing plugins
2024-04-29T15:50:02Z D! [inputs.gnmi::ciena-gnmi] Internal alias mapping: map[oc-if:/interfaces/oc-if:interface/oc-if:state/oc-if:counters:ifcounters]
2024-04-29T15:50:02Z D! [agent] Starting service inputs

2024-04-29T15:50:02Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 established
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714405874487000000
> interface,agent_host=10.144.248.21,host=10.5.200.224,ifIndex=9 ifHCInOctets=0i,ifHCOutOctets=0i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=0i,ifOutErrors=0i 1714405904512000000

2024-04-29T15:55:02Z D! [agent] Stopping service inputs
2024-04-29T15:55:02Z D! [inputs.gnmi::ciena-gnmi] Connection to gNMI device 10.144.248.21:6702 closed
2024-04-29T15:55:02Z D! [agent] Input channel closed
2024-04-29T15:55:02Z D! [agent] Processor channel closed
2024-04-29T15:55:02Z D! [agent] Processor channel closed
2024-04-29T15:55:02Z D! [agent] Processor channel closed
2024-04-29T15:55:02Z D! [agent] Stopped Successfully

twitten@Todd-Laptop etc % /Users/twitten/dev/devops/telegraf/bin/telegraf-1.31.0-58f7dadc/usr/bin/telegraf --config ./telegraf.conf --debug --test-wait 300 --config-directory ./telegraf.d
2024-04-29T16:12:12Z I! Loading config: ./telegraf.conf
2024-04-29T16:12:12Z I! Loading config: telegraf.d/cisco-gnmi.conf
2024-04-29T16:12:12Z W! DeprecationWarning: Option "fieldpass" of plugin "inputs.gnmi" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldinclude' instead
2024-04-29T16:12:12Z I! Starting Telegraf 1.31.0-58f7dadc brought to you by InfluxData the makers of InfluxDB
2024-04-29T16:12:12Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 5 secret-stores
2024-04-29T16:12:12Z I! Loaded inputs: gnmi
2024-04-29T16:12:12Z I! Loaded aggregators: 
2024-04-29T16:12:12Z I! Loaded processors: converter rename
2024-04-29T16:12:12Z I! Loaded secretstores: 
2024-04-29T16:12:12Z W! Outputs are not used in testing mode!
2024-04-29T16:12:12Z I! Tags enabled: host=10.5.200.224
2024-04-29T16:12:12Z D! [agent] Initializing plugins
2024-04-29T16:12:12Z D! [inputs.gnmi::cisco-gnmi] Internal alias mapping: map[Cisco-IOS-XR-pfi-im-cmd-oper:/interfaces/interface-xr/interface:ifcounters]
2024-04-29T16:12:12Z D! [agent] Starting service inputs

2024-04-29T16:12:12Z D! [inputs.gnmi::cisco-gnmi] Connection to gNMI device 10.255.30.1:57344 established
> interface,host=10.5.200.224,ifHighSpeed=10000000,ifIndex=132,ifName=TenGigE0/0/0/35,source=10.255.30.1 ifHCInOctets=464644054i,ifHCOutOctets=1247546241i,ifInCrcErrors=0i,ifInDiscards=0i,ifInErrors=116i,ifInRate=0i,ifOutDiscards=0i,ifOutErrors=0i,ifOutRate=0i 1714407171287000000

2024-04-29T16:14:50Z D! [agent] Stopping service inputs
2024-04-29T16:14:50Z D! [inputs.gnmi::cisco-gnmi] Connection to gNMI device 10.255.30.1:57344 closed
2024-04-29T16:14:50Z D! [agent] Input channel closed
2024-04-29T16:14:50Z D! [agent] Processor channel closed
2024-04-29T16:14:50Z D! [agent] Processor channel closed
2024-04-29T16:14:50Z D! [agent] Stopped Successfully
2024-04-29T16:14:50Z E! [telegraf] Error running agent: input plugins recorded 10 errors
srebhan commented 6 months ago

Thanks a lot for testing @whizkidTRW!

whizkidTRW commented 6 months ago

Thanks to both of you for addressing it so quickly!