Closed jdratlif closed 3 years ago
This appears to be something happening to Juniper gear specifically. OS release notes from their site appear to show fixes for ALPN issues.
Closing.
Have we determine that this issue is due to Juniper? Going to close, feel free to re-open if it's not a Juniper problem but a Telegraf one.
We've been using telegraf to collect streaming telemetry from Juniper routers. It's been working well for us, but we had to make a custom output plugin. With the release of 1.15 and the execd output plugin, we want to switch to that and stop compiling telegraf ourselves.
However, when I tried the config we were using in telegraf 1.15.3 from the redhat 7 repos direct from telegraf, I couldn't connect. The telegraf just says it's retrying the connection over and over. When I look at the Juniper logs, I see TLS errors.
Sep 23 18:54:32 chttp2_server.c:83: Handshaking failed: {"created":"@1600887272.560535833","description":"Cannot check peer: missing selected ALPN property.","file":"../../../../../../../../src/external/bsd/grpc/dist/src/core/lib/security/transport/security_connector.c","file_line":589}
After rebuilding VMX instances and reissuing certificates, I tried compiling telegraf from source. I get a slightly different error message, but the same problem.
Sep 23 18:55:35 chttp2_server.c:83: Handshaking failed: {"created":"@1600887335.591207967","description":"Handshake failed","file":"../../../../../../../../src/external/bsd/grpc/dist/src/core/lib/security/transport/security_handshaker.c","file_line":276,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I decided to try compiling with an older version of golang. If I compile telegraf with golang 1.13, everything works. If I use golang 1.14 or 1.15, it does not.
I'm not sure if this is a golang issue, a telegraf issue, a juniper issue, or something else. I asked about this in discord and they noticed this in the golang 1.14 release notes.
https://golang.org/doc/go1.14#minor_library_changes
The tls package no longer supports the legacy Next Protocol Negotiation (NPN) extension and now only supports ALPN. In previous releases it supported both. There are no API changes and applications should function identically as before. Most other clients and servers have already removed NPN support in favor of the standardized ALPN.
That suggests it could be a Juniper issue. I am working on talking to them as well, but I decided to file this here in case the problem isn't with Juniper.
Relevant telegraf.conf:
System info:
telegraf 1.15.3 CentOS 7
Steps to reproduce:
Try to collect gNMI telemetry on a Juniper router with upstream builds of telegraf configured with TLS. It will always fail to connect. The Juniper logs will say there is a TLS issue.
Expected behavior:
That the connection works and I get interface data streamed back to telegraf.
Actual behavior:
It fails to connect with TLS errors on the Juniper logs.
Additional info: