isambard-sc / clifton

SSH connection manager
8 stars 2 forks source link

Clifton fails to DNS resolve in locked-down networks #15

Open chryswoods opened 3 months ago

chryswoods commented 3 months ago

In some very locked down networks, clifton is failing to DNS resolve the keycloak URL. This is despite the OS being able to resolve the address via, e.g. the host command.

Redacted error message below;

$ ./clifton auth
Error: Could not get certificate.

Caused by:
   0: Could not get OAuth token.
   1: Failed to request codes from device auth endpoint
   2: Request failed
   3: request failed
   4: error sending request for url (https://XXXXX.ac.uk/realms/XXXX/protocol/openid-connect/auth/device): error trying to connect: dns error: failed to lookup address information: nodename nor servname provided, or not known
   5: error trying to connect: dns error: failed to lookup address information: nodename nor servname provided, or not known
   6: dns error: failed to lookup address information: nodename nor servname provided, or not known
   7: failed to lookup address information: nodename nor servname provided, or not known

$ host XXXX.ac.uk
XXXX.ac.uk has address 104.XX.XX.XX
XXXX.ac.uk has address 172.XX.XX.XX
XXXX.ac.uk has address 104.XX.XX.XX

It looks like the oauth2::BasicClient is not using the system DNS lookup?

This isn't urgent, as there are workarounds. But it may be worth looking at catching the error and dropping to the system DNS (if it isn't being used) or even doing a direct lookup via a known good DNS (e.g. 1.1.1.1) via a rust DNS client library?

milliams commented 3 months ago

The oauth2::BasicClient does not do any HTTP or DNS itself, instead it takes a HTTP client argument to do the request. In Clifton's case, it is using Reqwest. This uses Rust's std::net::ToSocketAddrs trait, which in turn uses the system's libc getaddrinfo.

Looking online for nodename nor servname provided, or not known it is indeed a glibc/libSystem message and I see many examples of people getting this error on MacOS. Mostly it's caused by /etc/hosts misconfiguration, but also reports of the MacOS DNS resolver getting into a state, particularly after a suspend - rebooting has fixed it in these cases.

I'm trying to keep dependencies low in order to control the binary size, but one option might be to switch Reqwest to use the Hickory DNS resolver. It might not fix it though - a thing to try at a later date.