influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.59k stars 5.56k forks source link

plugin telegraf_inputs-openldap.conf broken after update to telegraf-1.29.5 from telegraf-1.29.2 #15436

Closed paulusc closed 3 months ago

paulusc commented 4 months ago

Relevant telegraf.conf

# OpenLDAP cn=Monitor plugin
[[inputs.openldap]]
  host = "myhost.mydomain.com"
  port = 636

  # ldaps, starttls, or no encryption. default is an empty string, disabling all encryption.
  # note that port will likely need to be changed to 636 for ldaps
  # valid options: "" | "starttls" | "ldaps"
  tls = ldaps

  # skip peer certificate verification. Default is false.
  insecure_skip_verify = true

  # Path to PEM-encoded Root certificate to use to verify server certificate
  #tls_ca = "/etc/openldap/cacerts/openldapCA.pem"

  # dn/password to bind with. If bind_dn is empty, an anonymous bind is performed.
  bind_dn = "cn=StatisticsActt,ou=InternalAccount,dc=mydomain,dc=com"
  bind_password = "godknowswhatitisfromday1"

  # reverse metric names so they sort more naturally
  # Defaults to false if unset, but is set to true when generating a new config
  reverse_metric_names = true

Logs from Telegraf

[inputs.openldap] Error in plugin: LDAP Result Code 200 "Network Error": remote error: tls: handshake failure

System info

telegraf-1.29.2.5 RHEL7

Docker

No response

Steps to reproduce

1. 2. 3. ...

Expected behavior

Telegraf able to communicate openldap metrics to the host

Actual behavior

no communication

Additional info

happens when upgrading telegram from 1.29.2 to 1.29.5 Version 1.30.2 was also tested and same issue : [inputs.openldap] Error in plugin: LDAP Result Code 200 "Network Error": remote error: tls: handshake failure

powersj commented 4 months ago

Hi,

tls: handshake failure

Can you:

1) Confirm what cipher your server is using? (e.g. openssl s_client -connect myhost.mydomain.com:636) 2) Confirm this is consistently happening? 3) When you upgraded you carried over any local certificates?

happens when upgrading telegram from 1.29.2 to 1.29.5

This is the diff between 1.29.2 and 1.29.5. Unfortunately, I see no changes to any LDAP plugin code or any relevant dependencies. There was a similar report in https://github.com/influxdata/telegraf/issues/15236 where it turned out a gRPC library changed the default cipher suites allowed. Knowing what is expected could shed light on that.

paulusc commented 4 months ago

Hi, The cipher is : openssl s_client -connect localhost:636 | grep Cipher . . . New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384 Cipher : AES256-GCM-SHA384 Yes, this is happening consistently

To narrow down the issue, we upgrade from last known working version 1.29.2-1 until we get the tls: handshake failure message. For us the last good/working version is telegraf-1.29.4-1.x86_64

For sake of simplicity localhost is used in the plugin conf file grep -vP "#|^$" /etc/telegraf/telegraf.d/telegraf_inputs-openldap.conf [[inputs.openldap]] host = "localhost" port = 636 tls = "ldaps" insecure_skip_verify = true bind_dn = "cn=StatisticsAcct,ou=InternalAccount,dc=mydomain,dc=com" bind_password = "----------------" reverse_metric_names = true

With 1.29.5 the error appears: tls:handshake failure yum install telegraf-1.29.5-1 systemctl restart telegraf systemctl status telegraf -l . . . 2024-05-31T19:53:37Z D! [agent] Starting service inputs 2024-05-31T19:54:00Z E! [inputs.openldap] Error in plugin: LDAP Result Code 200 "Network Error": remote error: tls: handshake failure Thanks

powersj commented 4 months ago

To narrow down the issue, we upgrade from last known working version 1.29.2-1

Thank you very much for doing this. I think this does possibly narrow it down to the upgrade to go1.22.

New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384

I believe the TLS 1.0 and SSL 3.0 is the issue here, from the go1.22 docs:

By default, the minimum version offered by crypto/tls servers is now TLS 1.2

What we do for other plugins is expose some common TLS options that allow the user to specify the minimum version (e.g. VersionTLS10), cipher suites, etc. We need to do a little refactoring to expose all of these for this plugin.

Let me double check with the team today and we can hopefully get a PR up for you to test.

srebhan commented 4 months ago

@paulusc please use the newer inputs.ldap plugin instead of the openldap one as it supports more TLS options

[[inputs.ldap]]
  server = "ldaps://myhost.mydomain.com:636"

  bind_dn = "cn=StatisticsActt,ou=InternalAccount,dc=mydomain,dc=com"
  bind_password = "godknowswhatitisfromday1"

  reverse_field_names = true

  ## TLS options
  tls_min_version = "TLS10"
  tls_cipher_suites = ["TLS_AES_256_GCM_SHA384"]
  insecure_skip_verify = true

according to your config above.

paulusc commented 4 months ago

@srebhan the latest version available (10.30.2) was not able to digest the tls_cipher_suites option. Install the telegraf-nightly.x86_64.rpm this one does not show errors but no datas received. Still working on it.

## TLS options
 tls_min_version = "TLS10"
 tls_cipher_suites = ["TLS_AES_256_GCM_SHA384"]
 insecure_skip_verify = true

# rpm -qip telegraf-nightly.x86_64.rpm
Name        : telegraf
Version     : 1.31.0
Release     : 0
Architecture: x86_64
Install Date: (not installed)
Group       : default
Size        : 240184724
License     : MIT
Signature   : (none)
Source RPM  : telegraf-1.31.0-0.src.rpm
Build Date  : Wed 05 Jun 2024 08:58:57 PM EDT
Build Host  : 8a5e59fbdbc3
Relocations : /
Packager    : [support@influxdb.com](mailto:support@influxdb.com)
Vendor      : InfluxData
URL         : https://github.com/influxdata/telegraf
Summary     : Plugin-driven server agent for reporting metrics into InfluxDB.
Description :
Plugin-driven server agent for reporting metrics into InfluxDB.
srebhan commented 4 months ago

@paulusc which plugin are you using? You have to use inputs.ldap instead of inputs.openldap!

paulusc commented 4 months ago

@srebhan we are using the inputs.ldap plugin.

This is our last attempt to get it working. Sorry for the delay to respond, busy week!

[root@xxxxxxxxxxxxxx telegraf.d]# /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d -debug
WARN[0000]log.go:244 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null.
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-cpu.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-disk.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-diskio.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-filestat.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-internal.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-interrupts.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-kernel-vmstat.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-kernel.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-ldap.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-mem.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-net.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-netstats.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-processes.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-swap.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-sysctl_fs.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_inputs-system.conf
2024-06-14T17:56:32Z I! Loading config: /etc/telegraf/telegraf.d/telegraf_outputs-influxdb.conf
2024-06-14T17:56:32Z I! Starting Telegraf 1.31.0-079c9d28 brought to you by InfluxData the makers of InfluxDB
2024-06-14T17:56:32Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-06-14T17:56:32Z I! Loaded inputs: cpu (2x) disk (2x) diskio (2x) filestat internal interrupts kernel (2x) kernel_vmstat ldap linux_sysctl_fs mem (2x) net netstat processes (2x) swap (2x) system (2x)
2024-06-14T17:56:32Z I! Loaded aggregators:
2024-06-14T17:56:32Z I! Loaded processors:
2024-06-14T17:56:32Z I! Loaded secretstores:
2024-06-14T17:56:32Z I! Loaded outputs: influxdb
2024-06-14T17:56:32Z I! Tags enabled: host=xxxxxxxxxxxxxxx.xxxxxxx.xxx
2024-06-14T17:56:32Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"xxxxxxxxxxxxxxx.xxxxxxx.xxx", Flush Interval:1m0s
2024-06-14T17:56:32Z D! [agent] Initializing plugins
2024-06-14T17:56:32Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "[inputs.net](http://inputs.net/)" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead for protocol stats
2024-06-14T17:56:32Z D! [agent] Connecting outputs
2024-06-14T17:56:32Z D! [agent] Attempting connection to [outputs.influxdb]
2024-06-14T17:56:32Z D! [agent] Successfully connected to outputs.influxdb
2024-06-14T17:56:32Z D! [agent] Starting service inputs
2024-06-14T17:57:00Z E! [inputs.ldap] Error in plugin: connection failed: LDAP Result Code 200 "Network Error": remote error: tls: handshake failure

using the following config file

[root@xxxxxxxxxxxxxx telegraf.d]# cat telegraf_inputs-ldap.conf
[[inputs.ldap]]
  server = "[ldaps://](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)[xxxxxxxxxxxxxx](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)[.xxxxxxxxxx.xxx](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)"
  bind_dn = "cn=StatisticsAcct,ou=InternalAccount,dc=xxxxxxxxxx,dc=xxx"
  bind_password = "xxxxxxxxxxxxxxxxxxxxx"
  reverse_field_names = true
  # TLS options
  tls_min_version = "TLS10"
  tls_cipher_suites = ["TLS_AES_256_GCM_SHA384"]
  insecure_skip_verify = true
srebhan commented 4 months ago

@paulusc when running openssl s_client -connect localhost:636 what are the lines below SSL-Session:?

paulusc commented 4 months ago

@srebhan please find below the lines below SSL-Session:

SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : AES256-GCM-SHA384
    Session-ID: 9F27CDFEB561AEC1BDE8B8538C23AADA9CDE461F1A83D02C54B8088CC5CE953F
    Session-ID-ctx:
    Master-Key: RKM0MDBFNDRBRDVENZI3MENBRDQ3N0M3NDFCNZQWNDK0MZVFQTQ3QKQ1NTAYRDVCRUU4RJK5Q0EWMKE5NTM1QTE5NZIYQUYY
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    TLS session ticket lifetime hint: 300 (seconds)
    TLS session ticket:
    0000 - f5 d3 bf 7e d2 d2 a1 40-56 0f c6 40 de 13 0e 82   ...~...@V..@....
    0010 - ea 39 f3 4d c3 7f 0d bf-36 4d d2 35 30 59 a9 81   .9.M....6M.50Y..
    0020 - bc e6 49 2a 92 69 e6 a3-ef ad 98 28 ea 8f 10 7e   ..I*.i.....(...~
    0030 - bd 6f 8f da cc c3 14 f1-88 e0 65 e9 f7 6c 46 44   .o........e..lFD
    0040 - b3 19 c6 99 c0 ef 25 95-e0 51 30 22 33 3a 64 46   ......%..Q0"3:dF
    0050 - 18 d2 29 7c 34 a1 17 24-47 bc f3 c2 43 f9 4d 91   ..)|4..$G...C.M.
    0060 - 37 c3 f6 1b 00 20 42 73-66 51 f5 94 7a b2 15 a2   7.... BsfQ..z...
    0070 - 6f 3e 9e cf 7d cd ef a5-8a 06 56 72 5a 4b 0b c6   o>..}.....VrZK..
    0080 - 93 bd d0 4b 1f 75 4d 4d-2d 8a 56 16 5e b9 45 4b   ...K.uMM-.V.^.EK
    0090 - 92 95 3c 0b 68 ea 14 e7-20 41 28 20 ae 8b 40 6f   ..<.h... A( ..@o

    Start Time: 1718634179
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
srebhan commented 3 months ago

Just to make sure we are not hunting ghosts, your server string looks quite strange in

[[inputs.ldap]]
  server = "[ldaps://](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)[xxxxxxxxxxxxxx](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)[.xxxxxxxxxx.xxx](ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx)"
  bind_dn = "cn=StatisticsAcct,ou=InternalAccount,dc=xxxxxxxxxx,dc=xxx"
  bind_password = "xxxxxxxxxxxxxxxxxxxxx"
  reverse_field_names = true
  # TLS options
  tls_min_version = "TLS10"
  tls_cipher_suites = ["TLS_AES_256_GCM_SHA384"]
  insecure_skip_verify = true

it should be something like

  server = "ldaps://xxxxxxxxxxxxxx.xxxxxxxxxx.xxx:636"
srebhan commented 3 months ago

@paulusc I think I found the issue. Cipher TLS_AES_256_GCM_SHA384 is a TLS1.3 only cipher, you probably need to use TLS_RSA_WITH_AES_256_GCM_SHA384 for TLS1.2.

I put up PR #15570 which allows to specify all, secure and insecure as cipher-suite aliases, so you could try with all using the binary in the PR (available as soon as CI finished the tests)...

Furthermore, you likely do not need to restrict the TLS minimum version as the server offers TLS1.2...

paulusc commented 3 months ago

@srebhan All right you nailed it. With this configuration and the latest night build we are all ok. Thank you very much for your time and dedication, highly appreciated.

# rpm -q telegraf
telegraf-1.32.0-0.x86_64

[[inputs.ldap]]  
server = "ldaps://xxxxxxxxx.xxxxxxxxx.com:636"  
bind_dn = "cn=StatisticsAcct,ou=InternalAccount,dc=desjardins,dc=com"  
bind_password = "xxxxxxxxxxxxx"  
reverse_field_names = true
tls_cipher_suites = ["all"]  
insecure_skip_verify = true
srebhan commented 3 months ago

Closing this issue as the solution is to enable the corresponding insecure cipher. PR #15570 making this easier is already merged and will be released with v1.32.0...