influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.6k stars 5.57k forks source link

Input Plugin - JTI - Could not initiate login check #7050

Closed mohsin106 closed 1 year ago

mohsin106 commented 4 years ago

Hi @danielnelson ,

I'm running telegraf in a container and having that container connect to my Juniper Routers. Everything was working until I tried upgrading from telegraf-alpine:1.11.5 to 1.12.6.

When I upgraded from telegraf-alpine:1.11.5 to 1.12.6 I'm no longer able to establish a successful connection to my Juniper Routers and there is nothing streaming from the router to my telegraf container.

This is the error message I'm getting on telegraf-alpine:1.12.6:

2020-02-19T18:51:31Z E! [inputs.jti_openconfig_telemetry] Could not initiate login check for bbrj01.mgt.net:50051: rpc error: code = Unavailable desc = transport is closing 2020-02-19T18:51:31Z E! [inputs.jti_openconfig_telemetry] Could not initiate login check for bbrj02.mgt.net:50051: rpc error: code = Unavailable desc = transport is closing

I'm also seeing multiple sessions trying to establish on the router where before there was only one session established:

show system connections |match 50051
tcp4 0 0 172.17.249.66.50051 100.120.230.14.38398 TIME_WAIT tcp4 0 0 172.17.249.66.50051 100.120.230.14.38388 TIME_WAIT tcp4 0 0 172.17.249.66.50051 100.120.230.14.38378 TIME_WAIT tcp4 0 0 172.17.249.66.50051 100.120.230.14.38372 TIME_WAIT tcp4 0 0 172.17.249.66.50051 100.120.230.14.38362 TIME_WAIT tcp4 0 0 172.17.249.66.50051 100.120.230.14.38358 TIME_WAIT tcp46 0 0 .50051 .* LISTEN

This is also happening with telegraf-alpine:1.13.2.

Here is what my telegraf.conf file looks like:

[global_tags] [agent] round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "5s" flush_interval = "1m" flush_jitter = "5s" debug = false quiet = false hostname = "bb-telegraf-agent" omit_hostname = false

[[inputs.jti_openconfig_telemetry]] servers = ["bbrj01.mgt.net:50051","bbrj02.mgt.net:50051"]
sample_frequency = "60000ms" username = "$routeruser" password = "$routerpass" client_id = "$containerName"

sensors = ["interfaces /junos/system/linecard/interface/", "ifaceDesc /interfaces/interface/state/description/", "lsps /junos/services/label-switched-path/usage/", "cpu /junos/system/linecard/cpu/memory/", "npu /junos/system/linecard/npu/memory/", ]

ssl_cert = "/etc/telegraf/juniper_tls_cert.pem" str_as_tags = false

fielddrop = [ "/interfaces/interface/state/last-change", "/interfaces/interface/init_time" ]

[inputs.jti_openconfig_telemetry.tagdrop] "/components/component/propertiesproperty/@name" = [ "mem-util-kernel-cos-halp", "mem-util-kernel-cos-allocations", "mem-util-kernel-cos-bytes-allocated", "mem-util-kernel-cos-frees", "mem-util-kernel-cos-halp", "mem-util-kernel-filter-allocations", "mem-util-kernel-filter-bytes-allocated", "mem-util-kernel-filter-frees", "mem-util-kernel-filter-halp", "mem-util-kernel-flow-table-allocations", "mem-util-kernel-flow-table-bytes-allocated", "mem-util-kernel-flow-table-frees", "mem-util-kernel-fpb-allocations", "mem-util-kernel-fpb-bytes-allocated", "mem-util-kernel-fpb-frees", "mem-util-kernel-fpb-syms-", "mem-util-kernel-fpdl", "mem-util-kernel-halp-unknown", "mem-util-kernel-iff-allocations", "mem-util-kernel-iff-bytes-allocated", "mem-util-kernel-iff-frees", "mem-util-kernel-ifl-allocations", "mem-util-kernel-ifl-bytes-allocated", "mem-util-kernel-ifl-frees", "mem-util-kernel-ifl-halp", "mem-util-kernel-ipc-log", "mem-util-kernel-nh", "mem-util-kernel-rt-allocations", "mem-util-kernel-rt-bytes-allocated", "mem-util-kernel-rt-frees", "mem-util-kernel-rt-halp", "mem-util-kernel-rtt-allocations", "mem-util-kernel-rtt-bytes-allocated", "mem-util-kernel-rtt-frees", "mem-util-kernel-sample", "mem-util-kernel-toe-jflow-tal", "mem-util-kernel-toe-ka", "mem-util-kernel-toe-pio-", "mem-util-kernel-toe-pkt-transfer-allocations", "mem-util-kernel-toe-pkt-transfer-bytes-allocated", "mem-util-kernel-toe-pkt-transfer-frees", "mem-util-kernel-toe-stats-accl", "mem-util-packet-dma-bytes-allocated", "mem-util-packet-dma-size", "mem-util-beta", "mem-util-edf", "mem-util-fcv", "mem-util-flt", "mem-util-jnh-egress-size", "mem-util-jnh-final", "mem-util-jnh-loadbal", "mem-util-jnh-refbits", "mem-util-jnh-remap", "mem-util-kht", "mem-util-plct", "mem-util-policer", "mem-util-sfm-entries-size", "mem-util-kernel-agent" ]

[[processors.converter]] [processors.converter.fields] tag = [ "/interfaces/interface/state/parent_ae_name", "/interfaces/interface/state/oper-status" ]

[[processors.rename]] [[processors.rename.replace]] measurement = "openconfig-interfaces:interfaces/interface" dest = "interfaces"

[[processors.rename.replace]] tag = "/interfaces/interface/state/oper-status" dest = "oper-status" [[processors.rename.replace]] tag = "/interfaces/interface/state/counters/out-queue/@queue-number" dest = "queue-number" [[processors.rename.replace]] tag = "/mpls/lsps/constrained-path/tunnels/tunnel/@source" dest = "tunnel-constrained-path-source" [[processors.rename.replace]] tag = "/components/component/propertiesproperty/@name" dest = "property-name" [[processors.rename.replace]] tag = "/mpls/lsps/constrained-path/tunnels/tunnel/@name" dest = "tunnel-constrained-path-name" [[processors.rename.replace]] tag = "/mpls/lsps/constrained-path/tunnels/tunnel/state/counters/@name" dest = "tunnel-constrained-path-counters-name" [[processors.rename.replace]] tag = "/interfaces/interface/state/parent_ae_name" dest = "parent-ae-name" [[processors.rename.replace]] tag = "/interfaces/interface/@name" dest = "interface-name" [[processors.rename.replace]] tag = "/components/component/@name" dest = "component-name"

[[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-broadcast-pkts" dest = "out-broadcast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/carrier-transitions" dest = "carrier-transitions" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-octets" dest = "in-octets" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-red-drop-bytes" dest = "red-drop-bytes" [[processors.rename.replace]] field = "/mpls/lsps/constrained-path/tunnels/tunnel/state/counters/packets" dest = "packets" [[processors.rename.replace]] field = "/mpls/lsps/constrained-path/tunnels/tunnel/state/counters/bytes" dest = "bytes" [[processors.rename.replace]] field = "/interfaces/interface/state/high-speed" dest = "high-speed" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-avg-buffer-occupancy" dest = "avg-buffer-occupancy" [[processors.rename.replace]] field = "/components/component/propertiesproperty/state/value" dest = "property-value" [[processors.rename.replace]] field = "/interfaces/interface/out-queue/allocated-buffer-size" dest = "allocated-buffer-size" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-cur-buffer-occupancy" dest = "cur-buffer-occupancy" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-unicast-pkts" dest = "in-unicast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-multicast-pkts" dest = "out-multicast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-octets" dest = "out-octets" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-bytes" dest = "bytes" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/allocated-buffer-size-ping" dest = "allocated-buffer-size-ping" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-pkts" dest = "pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/description" dest = "description" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-unicast-pkts" dest = "out-unicast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/last-change" dest = "last-change" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-broadcast-pkts" dest = "in-broadcast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-peak-buffer-occupancy" dest = "peak-buffer-occupancy" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-errors" dest = "in-errors" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-multicast-pkts" dest = "in-multicast-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-queue/-red-drop-pkts" dest = "red-drop-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/in-pkts" dest = "in-pkts" [[processors.rename.replace]] field = "/interfaces/interface/state/counters/out-pkts" dest = "out-pkts"

[[outputs.kafka]]
namepass = ["interfaces","ifaceDesc"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-json-interfaces-test" compression_codec = 1 required_acks = 1 data_format = "json" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["lsps"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-json-lsps-test" compression_codec = 1 required_acks = 1 data_format = "json" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["cpu"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-json-cpu-test" compression_codec = 1 required_acks = 1 data_format = "json" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["npu"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-json-npu-test" compression_codec = 1 required_acks = 1 data_format = "json" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["interfaces","ifaceDesc"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-influx-interfaces-test" compression_codec = 1 required_acks = 1 data_format = "influx" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["lsps"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-influx-lsps-test" compression_codec = 1 required_acks = 1 data_format = "influx" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["cpu"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-influx-cpu-test" compression_codec = 1 required_acks = 1 data_format = "influx" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key" [[outputs.kafka]]
namepass = ["npu"] brokers = ["kafka1.mgt.com:9093", "kafka2.mgt.com:9093"] topic = "backbone-clean-influx-npu-test" compression_codec = 1 required_acks = 1 data_format = "influx" max_retry = 3 tls_ca = "/etc/telegraf/kafka_ca.pem" tls_cert = "/etc/telegraf/kafka.cer" tls_key = "/etc/telegraf/kafka_priv.key"

Thank you, Mohsin

danielnelson commented 4 years ago

I looked over the changes to this plugin, and in #6027 it looks like we may have made an setting change that is not backwards compatible. Does the issue go away if you add enable_tls = true to the plugin configuration?

mohsin106 commented 4 years ago

No, the issue persists with enable_tls = true setting in place. The "Use of deprecated configuration: enable_tls should be set when using TLS" warning goes away when enable_tls = true is configured in the Kafka output stanza.

On Wed, Feb 19, 2020 at 5:47 PM Daniel Nelson notifications@github.com wrote:

I looked over the changes to this plugin, and in #6027 https://github.com/influxdata/telegraf/pull/6027 it looks like we may have made an setting change that is not backwards compatible. Does the issue go away if you add enable_tls = true to the plugin configuration?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/7050?email_source=notifications&email_token=AAURPGX5GDKVYBDCA4LNVFTRDWZGPA5CNFSM4KYDABEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMJ76YI#issuecomment-588513121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAURPGXCEYOQB5BH3YB3L33RDWZGPANCNFSM4KYDABEA .

danielnelson commented 4 years ago

Just to clarify, did you set enable_tls on the jti_openconfig_telemetry plugin too?

mohsin106 commented 4 years ago

No, I only enabled it inside of the Kafka output stanza. I just enabled it inside of the JTI input stanza and now I'm seeing error message relating to the certificate:

2020-02-19T23:26:14Z E! [inputs.jti_openconfig_telemetry] Could not initiate login check for bbrj01.mgt.cox.net:50051: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority"

Then I added insecure_skip_verify = true and now it seems to be working. I will monitor this overnight.

Thanks for your help. Mo

On Wed, Feb 19, 2020 at 6:15 PM Daniel Nelson notifications@github.com wrote:

Just to clarify, did you set enable_tls on the jti_openconfig_telemetry plugin too?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/7050?email_source=notifications&email_token=AAURPGQSYXDU3LJO2EFNZLLRDW4Q3A5CNFSM4KYDABEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMKCR3Q#issuecomment-588523758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAURPGVQR6R2QCFIBAYOAYTRDW4Q3ANCNFSM4KYDABEA .

danielnelson commented 4 years ago

Great, and if you have a copy of the CA certification you can add it as the tls_ca option for a security improvement over insecure_skip_verify.

So it looks like for this issue we should add a warning if there is TLS settings with enable_tls set, and automatically enable TLS if it is unset and other TLS settings are modified. Essentially the same as what was done in the Kafka output.

mohsin106 commented 4 years ago

Agreed, the same notification should be applied to any input/output plugin where TLS is being used as already applied to the Kafka output plugin.

On Wed, Feb 19, 2020 at 7:54 PM Daniel Nelson notifications@github.com wrote:

Great, and if you have a copy of the CA certification you can add it as the tls_ca option for a security improvement over insecure_skip_verify.

So it looks like for this issue we should add a warning if there is TLS settings with enable_tls set, and automatically enable TLS if it is unset and other TLS settings are modified. Essentially the same as what was done in the Kafka output.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/7050?email_source=notifications&email_token=AAURPGWP34Q6NO367OWCXP3RDXIFFA5CNFSM4KYDABEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMKJ7MY#issuecomment-588554163, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAURPGWNJJALP5KMALGYLB3RDXIFFANCNFSM4KYDABEA .

srebhan commented 1 year ago

@mohsin106 is this still an issue with the latest version of telegraf?

mohsin106 commented 1 year ago

I believe the code that Juniper was running only supported an old version of TLS. When we upgraded the Juniper code version we were able to upgrade our Telegraf version as well and were able to stream.