jcmturner / gokrb5

Pure Go Kerberos library for clients and services
Apache License 2.0
723 stars 245 forks source link

Single KDC host outage leads to complete failure in the UDP path. #454

Closed bmahler closed 2 years ago

bmahler commented 2 years ago

We saw an issue where a single KDC host was down, which led clients to sometimes have a complete failure to communicate with the KDCs. Note that we only use UDP and don't have TCP set up for the KDCs:

time="2021-11-01T17:42:07Z" level=warning msg="Encountered error in fetching kerberos token :could not initialize context: [Root cause: Networking_Error] Networking_Error: TGS Exchange Error: issue sending TGS_REQ to KDC: failed to communicate with KDC. Attempts made with UDP (sending over UDP failed to <IP2>:88: read udp <IP1>:33139-><IP2>:88: i/o timeout) and then TCP (lookup _kerberos._tcp.<snip> on 127.0.0.1:53: no such host)"
<repeats after 5 seconds>

This was surprising since all of the other KDC hosts were online and the client should be trying the other IP addresses.

Based on the format of this error message and the code, it appears that the UDP dialing logic succeeds (it is UDP after all, so nothing should be getting communicated as part of dialing, see here for golang details) and then sending fails here (naturally, the host is down and the library times out trying to read back bytes).

So, while the dialing logic in this library is written to try to dial all resolved IPs until successful, in the case of UDP, dialing is successful even when the host is down. Then the read fails, and there is no attempt to try the other IPs. Essentially, this appears to be a bug in this client implementation.

jcmturner commented 2 years ago

I believe this was fixed by #399

Please reopen if this is still and issue with the latest of v8.